Multidisciplinary Column: An Interview with Odette Scharenborg

Odette, could you tell us a bit about your background, and what the road to your current position was?

Dr Odette Scharenborg, Associate professor and Delft Technology Fellow, SpeechLab/Multimedia Computing Group, Delft University of Technology

In high school, I enjoyed both languages and science topics such as physics, chemistry and biology. When researching what I wanted to study I came across “Language, Speech, and Computer Science” at Radboud University, Nijmegen, the Netherlands, which sounded and indeed was an interesting combination of both languages and science topics. Probably inspired by one of my favourite TV series when I was younger, the Knight Rider, which included a car with which you could communicate through speech, I from early on focused on speech technology.

After obtaining my university degree in 2000, I was offered a PhD position at the same department as I pursued my studies, on another interdisciplinary topic: computational modelling of human speech processing. My PhD project (2001-2005) combined theories about human speech processing (psycholinguistics) and tools and approaches from automatic speech recognition (which itself is more or less at the cross-roads of electrical engineering and computer science) in order to learn more about how humans process speech and improve automatic speech recognition (i.e., the conversion of speech into text).

After obtaining my PhD (in 2005), I went to the Speech and Hearing group in the Department of Computer Science at the University of Sheffield, UK, for a visiting post-doc position (funded by a Dutch Science Foundation (NWO) Talent Scholarship). I then returned to Radboud University for a 3-year post-doc position (funded by an NWO Veni personal fellowship) on new computational modelling of human speech processing project. After this project, I felt that after having read so much about the theories about humans process speech, I really wanted to know how researchers actually came to these theories. So, in the next few years, my research focused on human speech processing. First at the Max Planck Institute for Psycholinguistics, where I was trained as a psycholinguist, and subsequently, funded by an NWO Vidi personal grant, again at the Radboud University, where I became Associate Professor.

Towards the end of my Vidi-project (in 2016), I started to miss the computer science component of my earlier research and decided to try to move back into automatic speech recognition. I had an idea, met two amazing speech researchers who loved my idea, and we decided to collaborate. This collaboration (still ongoing) has allowed me to move back into the field of automatic speech recognition that at that time was rapidly changing due to the rise of deep learning.

In 2018, my Vidi project and contract at Radboud University ended, and I became unemployed. I was then headhunted by a company on automatic speech recognition for health applications. However, I felt that I wanted to stay in academia. Luckily for me, shortly after joining the company, Delft University of Technology offered me a Delft Technology Fellowship, and I joined TU Delft in June 2018, where I’ve since then worked as an Associate Professor of Speech Technology.

How important is interdisciplinarity in your research on speech?

As probably is clear from my road so far, I am an interdisciplinary researcher. The field of automatic speech recognition is already interdisciplinary in that it combines electrical engineering and computer science. However, in my research, I use my knowledge about sounds and sound structures (i.e., phonetics, a subfield of linguistics) and am inspired by and use knowledge about how humans process speech (i.e., psycholinguistics). The speech signal is a signal that can be researched and viewed from different angles: from the perspective of frequencies (physics), the perspective of the individual sounds (phonetics), meaning (semantics), as a means to convey a message or intent, etc. . It also contains different types of information: the words of the message, information about the speaker’s identity, age, gender, height, health status, emotional status, native language, to name only a few.

The focus of my research is on automatic speech recognition. Automatic speech recognisers typically work well for “standard” speakers of a small number of languages. In fact, for only about 2% of all the languages in the world, there is enough annotated speech data to build automatic speech recognisers. Moreover, a large portion of society does not speak in a “standard” way: “standard” speakers are native speakers of a language, without a speech impediment, without a strong regional accent, typically highly educated, and between the ages of 18 and 60 years. As you can tell, this excludes a large portion of our society: children, elderly, people with speaking or voice disorders, deaf people, immigrants, etc. In my work, I focus on making speech technology, and particularly automatic speech recognition, available for everyone, irrespective of how one speaks and the language one speaks. In order to do so, I look at how humans process speech as they are the best speech recognisers that exist; moreover, they can quickly adapt to idiosyncrasies in a speaker’s speech or voice. Moreover, I use knowledge about how sounds and the voice sound differently depending on, for instance, the speaker’s age or health status. So, in my research towards inclusive speech technology, I combine computer science with linguistics and psycholinguistics. Interdisciplinarity is thus at the core of my research.

What disciplines do you combine yourself in your own work?

As explained above, in my research I combine multiple research fields, most notably: computer science, different subfields of psycholinguistics (first and second language learning, native and non-native speech processing; the processing of emotions) and linguistics (primarily phonetics and a bit of conversational analysis).

Could you name a grand research challenge in your current field of work?

There are several grand research challenges in my field:

  • I already named one: making speech technology available for everyone, irrespective of how one speaks and what language one speaks. One of the grand challenges for this is to build speech technology for speech that is not only highly variable but for which also only a little amount of data is available (i.e., low resource scenarios).
  • A second grand challenge: when people speak they often use words or phrases from another language, this is called code-switching. Automatic speech recognisers are typically built for one language; it is very hard for them to deal with code-switched speech.
  • A third grand challenge: speech is often produced with background noise or background speech present. This deteriorates recognition performance tremendously. Dealing with all the different types of background noise and speech is another grand challenge.

You have been an active champion for diversity and inclusion. Could you tell us a bit more about your activities on these topics?

When I was growing academically, I did not really have a female role model, and especially not female role models who had children. When I was in my late twenties/early thirties, I found this hard because I was afraid that having children would negatively impact my chances for the next academic job and my academic career in general. Also, being not only a first-generation PhD but also a first-generation academic, it took me a really long time to realise there were unwritten rules and, knowing what these were and how to deal with them (not sure I now know all 😉 ). Then, when I became Associate Professor at Radboud University, I found that several students, male and female, regularly came to talk to me about personal and academic issues and, that they thought my advice useful and I found it interesting and motivating to talk to them. I wanted to do more regarding gender equality but didn’t know how.

Then in 2016, a group of senior female speech researchers together organised the Young Female Researchers in Speech Science and Technology Workshop, in conjunction with the flagship conference of the International Speech Communication Association (ISCA) Interspeech, in order to attract more female students into a speech PhD program. I was invited as a mentor. This workshop was highly successful and now is a yearly workshop in conjunction with Interspeech. I joined the organisation of this workshop for 3 years. Then in 2019, having advocated gender equality in the ISCA board of which I’ve been a member since 2017, I was asked to form a new committee: the committee for gender equality. Very quickly this committee started to focus on more than gender and look at other types of diversity, sexual orientation, research areas (ISCA encompasses several speech sub-areas, including phonetics, psycholinguistics, health, automatic speech recognition, speech generation, etc.), and geographical regions. Naturally, we not only wanted to attract people from diverse backgrounds but also wanted to retain them, so we also started to look into inclusion. The first thing our committee did, was to create a website where female speech researchers who hold a PhD can list themselves. This website is used to help workshop/conference organisers to find female researchers for the organising committee, as panellists and keynote/invited speakers, etc. We then went on to organise diversity and inclusion meetings at Interspeech and for 2 years we organise a separate ISCA-queer meeting. We have held a workshop in Africa (remotely due to the pandemic) in order to reach local speech researchers there and see where we can collaborate and where we can help them with our resources and expertise. We wrote a code of conduct for session chairs at workshops/conferences in order for them to know how to balance questions from people from minority groups and non-minority groups. To name but a few of our activities.

In 2020 I came up with the idea for a mentoring programme within the IEEE Signal Processing Society (SPS), for students from minority groups, which was well received and was funded with $50K annually. This programme, loosely based on the YFRSW-format, provides students with a mentor from our society who will supervise them for a period of 9 months, and who will mentor them and help them build a network. Each student receives $4K to visit one of the IEEE (SPS) conferences/workshops. In the first round, we awarded 9 students from all over the world.

In addition to these activities, I’ve also been on the board of the Delft Women in Science (DEWIS) at my university and the chair of the Diversity and Inclusion Committee (EDIT) at my faculty at TU Delft. Additionally, I am regularly asked to appear as a female role model in STEM for young girls and in Dutch media.

In getting to your current position, you experienced some personal hardships. In serving as a public role model, you have been open about these. How can we learn from these experiences to make academia a better place?

My CV shows the (many) consecutive positions I’ve had and how almost all are financed by personal grants that I obtained. These personal grants especially tend to attract a lot of praise. What my CV doesn’t show is the story behind it. It doesn’t show the many job applications I sent out, which never led to a position. It doesn’t show that for a period of more than 2 years I did not have a contract, meaning that I did not have any social security, while I was working on a post-doc position. It does not show how I was bullied at my previous university and the damage that did to my self-esteem, something I still struggle with. It does not show that I had to leave behind my 10-month-old daughter for a month and again for 2 weeks because I was expected to be in Germany for a post-doc position, nor does it show the two bouts of (mild) depression I suffered (one directly related to the bullying). I never talked about all of this because as a temporary (and young and female) researcher, you feel extremely vulnerable because you are so dependent on (the goodwill of) other, more senior researchers. If you don’t want to or cannot do a task, if you complain, they will simply find someone else and you are without a job again. On top of that, you often simply are not believed.

When I became a mentor for students and young researchers, I decided to share some of my struggles so that they knew that they were not the only ones who struggled and that I knew what they were going through. I began to receive feedback from these students that they appreciated my honesty and openness, which gave me the courage to be more open about my own issues. However, only after I received my permanent position at TU Delft (in 2019), and after becoming active in diversity and inclusion, did I very slowly dare to speak more openly to my colleagues and senior people about what had happened.

In late 2019, I was asked to talk about what it is like to be a female researcher in speech technology at the IEEE Workshop on Automatic Speech Recognition and Understanding. I thought about the story I wanted to tell, and eventually, I decided to tell my colleagues, including many of my close friends, my story: I started by showing them my CV, which received a lot of appreciative nods. I then told the story of my life, the story that is not shown by my CV, including the hardships. This resulted in many of my male colleagues and friends crying. Of course, this was never my intention. I don’t think that my story is that much different from the average person from a minority group and probably there are quite a few men whose stories are worse than mine.What I wanted to say was: CVs might look great, or they might not. It is important to not take CVs or facts or numbers at face value, you don’t know what people go through or have done to get where they are. Everyone has a story to tell; but it is, unfortunately, the case that the bad stories far more often happen to women and other people from minority groups.

A third message of piece of advice is that if you go through a hard time, know that you are not alone. In life in general, and in academia particularly, we celebrate successes, but failures and hardships are ignored and are often considered a weakness. I strongly believe that by being open about one’s hardships, you will feel better yourself, and will help others with dealing with their hardships.

Finally, we need to see fellow academics as people and treat them as people. We need to be supportive of one another, especially of our younger colleagues and of those from minority groups. We should be mentors and role models. We should listen to what they are saying and believe what they are saying. Not question what they say, but believe them when they describe something bad that has happened to them and help, because daring to speak up takes an enormous amount of strength and courage. If one dares to speak up, believe that it is true and tell them that you know how courageous they have to be to speak up.

How and in what form do you feel we as academics can be most impactful?

As academics we have many responsibilities: we teach the younger generation, we investigate and develop new technology and theories. Some of our research has a direct impact on society, some research does not yet, some research will maybe never have a direct impact on society. I don’t believe that all research needs to have an impact. I do believe that we as academics can be impactful, and that is by explaining science to the general audience. What is science? Why don’t scientists have answers to all questions? Why is what you do important? By explaining one’s research in layman’s terms, science and scientific output will become easier to understand for non-scientists. It will help shape public debate. It will lead to scientific results not being as easily dismissed as nowadays often happens. At the same time, and at least as important: by talking to people from the general public, you as an academic will see the world through their eyes, look at the impact of your work in a different way, and I am convinced it will also often lead to the explanation of why a certain development or technology is not adopted by society at large or by a particular group in society. In short, academics can be most impactful by communicating with the general public, and communication is and thus should be a two-directional process.


Bios

Dr Odette Scharenborg is an Associate Professor and Delft Technology Fellow at the Multimedia Computing Group at the Delft University of Technology, the Netherlands, and the Vice-President of the International Speech Communication Association (ISCA). Her research focuses on human speech-processing inspired automatic speech processing with the aim to develop inclusive speech technology, i.e., speech technology that works for everyone irrespective of how they speak or the language they speak.

Since 2017, Odette is on the Board of ISCA, where she is also the chair of the Diversity committee (since 2019) and was co-chair of the Interspeech Conferences committee and of the Technical Committee (2017-2019). From 2018-2021, Odette was a member of the IEEE Speech and Language Processing Technical Committee (subarea Speech Production and Perception). From 2019-2021, she was an Associate Editor of IEEE Signal Processing Letters, where she now is a Senior Associate Editor.

Editor Biographies

Cynthia_Liem_2017

Dr Cynthia C. S. Liem is an Associate Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. Her research interests focus on making people discover new interests and content which would not trivially be retrieved in music and multimedia collections, assessing questions of validation and validity in data science, and fostering trustworthy and responsible AI applications when human-interpreted data is involved. She initiated and co-coordinated the European research projects PHENICX (2013-2016) and TROMPA (2018-2021), focusing on technological enrichment of digital musical heritage, and participated as technical partner in an ERASMUS+ education innovation project on Big Data for Psychological Assessment. She gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach, Researcher-in-Residence 2018 at the National Library of The Netherlands, general chair of the ISMIR 2019 conference, and keynote speaker at the RecSys 2021 conference. Presently, she co-leads the Future Libraries Lab with the National Library of The Netherlands, is track leader of the Trustworthy AI track in the AI for Fintech lab with the ING bank, holds a TU Delft Education Fellowship on Responsible AI teaching, and is a member of the Dutch Young Academy.

jochen_huber

Dr Jochen Huber is Professor of Computer Science at Furtwangen University, Germany. Previously, he was a Senior User Experience Researcher with Synaptics and an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

Multidisciplinary column: the importance of talking to a 12-year old

In 2018, while on a research visit to Bordeaux, I felt it would be good to connect more closely to the local community. As a consequence, colleagues convinced me to join the Femmes & Sciences movement, in which women researchers in STEM proactively did local outreach.

My French was conversational, though not stellar. But I thought it hopefully should be good enough to converse with young teenagers. Furthermore, as for ‘local community’, it would be a nice idea to both get to know colleagues and the culture of the local schools. So there I went, speaking at a countryside school in one of the many wine regions, and at a secondary school in Bordeaux where students would not trivially think of STEM university careers.

It was an amazing and enlightening experience. As soon as I started to talk about search engines, recommender systems, music and video services, a spark really ignited in the students. They knew these, and they used them daily!

But it only was because of me mentioning it, that they started realizing there was computer science technology behind all these services. Before, they had no clue.

And I think this is a real problem, that we as a community severely undervalue.

In my own family, my father (electrical engineering), sister (civil engineering & geomatics) and I (computer science) studied to become engineers. For the rest of my family, this meant we were ‘the technical people’, getting called in when computers were slow, cell phones were updated and printers started malfunctioning. This especially happened to my father and me, as we ‘were good with computers, since that was our profession’.

But I had not studied to fix printers. And, as I joked during university open days to prospective students, my sister never got asked to go fix the kitchen sink, even though she had been taught about water management.

It always has been striking to me how malfunctioning hardware and software were the first associations that laypeople outside of our field seemed to have with our work. Today, this is broadening to fears of hacking, and on the less negative side, (overblown?) hopes in AI and cryptocurrency. In all these cases, the technology is something alien, something that ‘normal’ humans do not understand and grasp well.

Yet at the same time, the technologies we build affect everyone’s lives, increasingly so. Frequently, they silently work in the back, and we indeed only visibly notice them if something goes wrong. But then, rather strange associations and dialogues emerge.

Recently, I became a member of the national Young Academy, a body of earlier-career faculty across disciplines in The Netherlands, playing a public opinion-making role on academic culture, the image of academia and its findings, and associated policy-making. Through this role, and with my background in search and recommendation, I am increasingly being invited into committees, workshops and other forms of public appearances, that involve policy-makers and laypeople concerned with the impact of AI technologies (especially: possible exclusion of humans, as a consequence of the use of AI technologies).

In these activities, it has again been striking to me how little common vocabulary is present, and how questions thus get formulated awkwardly. More than once, I get asked ‘what the algorithm exactly is doing’, when my discussion partners actually refer to broader decision-making processes, where problems may occur across the pipeline, also already before any algorithm would be deployed.

When I try to explain that much of the applications of interest focus on prioritization with a cutoff within a larger collection, and I ask how my discussion partners would prioritize, I get blank stares if I keep this story at the current, general, abstract level that would come naturally to me as a computer scientist. If I’m unlucky, I may even get an answer back that my discussion partners don’t want to take a stance themselves, as it is ‘difficult and subjective’ matter, but ‘surely AI can do this better than we humans?’. Now that will form a problem if we will frame the problem in a supervised learning setup, without a sense of solid ground truth or criteria to optimize for.

However, going through simple, concrete examples ‘close to home’ does seem to help. Here, I really benefited from the experience I had learnt while in Bordeaux and beyond, especially in setups where I had to work with children.

Try to explain concepts of information retrieval and data modelling in a non-native tongue to a 12-year old, and you are forced to ask simple questions, that will give insight into these children’s own world views and contexts. It will give them building blocks they recognize and can build on.

Working in music and multimedia has greatly helped me here; as said before, everyone is a heavy daily user of music and multimedia services, and thus (without explicitly knowing) actually has some world view ready on preferences, priorities and ways to navigate larger information collections. This will greatly help as a discussion starter, with the discussion elements remaining tangible for everyone.

I would argue that working on a better public understanding of our work is among the most societally impactful roles that we, as researchers in the field, can play. Our discussion partners are stakeholders who don’t realize they are stakeholders. And of course, in the case of children, they may at the same time be the future technologists, who in the future will build forth on our work.

It takes serious time investment and a lot of practice to get this right. I have always been puzzled at how this typically meant this would be considered too much of a time sink, and not our prime responsibility as academics. But who else would otherwise take this up?

And if I think of how much time I have been encouraged to sink into endlessly rewriting grant proposals or papers at the micro-level, just to hopefully please reviewers, something does not feel right. Any acceptances following this have arguably been good for my career. But I am not quite convinced this has been more meaningful use of the public money my contract is funded from.

Or, in a more positive interpretation: in our community, we actually care about communicating well, and are clearly willing to invest in it. But so far, we really have been focusing our attention inward, while there is a lot to gain when we’d rather look outward.

So for those who would be interested in engaging more with those outsides of our field: please do. Outreach is much more than cute PR. And with the applications that we work on being so close to people’s daily lives, we in music/multimedia hold some very important keys, and really should learn the perspectives of our end users.

So let’s use those keys, and finally, get some doors opened that have remained shut for too long.

Editor Biographies

Cynthia_Liem_2017

Dr. Cynthia C. S. Liem is an Associate Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. Her research interests focus on making people discover new interests and content which would not trivially be retrieved, and assessing questions of validation and validity, especially in the context of music and multimedia search and recommendation. She initiated and co-coordinated the European research projects PHENICX (2013-2016) and TROMPA (2018-2021), focusing on technological enrichment of digital musical heritage, and gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach, and is a member of the Dutch national Young Academy.

jochen_huber

Dr. Jochen Huber is Professor of Computer Science at Furtwangen University, Germany. Previously, he was a Senior User Experience Researcher with Synaptics and an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

Multidisciplinary Column: Conferences as Career and Community Catalysts

A little over 10 years ago, I chose to pursue a PhD. This meant I chose a professional life in which research publications and their uptake would be seen as major evidence of achievement. For those working in computer science, the major dissemination platforms for such publications are conferences.

Given my dual background in music and computer science, it was logical that my main interests were in topics that connected these both worlds. As a consequence, I hoped to become part of the Music Information Retrieval community. The International Society for Music Information Retrieval (ISMIR) therefore seemed the professional community to target, and the annual ISMIR conference the most logical place to present my work at.

In terms of its education and research, my department at TU Delft had track records and agendas in visual and social multimedia content analysis, but not particularly in music. Considering methodology and philosophy, I did think a lot of the work at the department was compatible with what I tried to do in music. Furthermore, as I still was in training in a selective major at the conservatoire, I was not in a good position to geographically move to any other institute that would have a more established Music Information Retrieval track record. So I inquired whether I could stay in Delft for pursuing my PhD.

The answer was somewhat complicated. There was no funding for a PhD position in Music Information Retrieval, and there were no strategic plans to change that. At the same time, the people who had supervised me as a student (in particular, my thesis supervisor Alan Hanjalic) saw promise in me, and would like to keep working with me. Ultimately, I got a one-year contract in which my main task was to try acquiring funding and international community backing to pursue a Music Information Retrieval PhD in a multimedia group.

At the start of that year, I got to attend my first ISMIR conference, where I presented a paper based on my master’s thesis. In a previous column for the SIGMM records, I already discussed my experiences at that moment: how debuting alone at a conference was intimidating, but how I was lucky that senior members of the community pro-actively took care I got introduced to other attendees. Frans Wiering, the senior member who looked after me in particular at that moment, was general chair of the upcoming ISMIR, which would take place in Utrecht, so in my home country. Frans was quick to invite me to serve as a student volunteer, which was very good news for me. As my year would be filled with grant-writing, I did not yet have a sufficiently stable infrastructure around me to be able to truly do research, so submitting to the next ISMIR was out of reach. But this way, I could still attend the conference, and even would have an excuse to keep mingling with all the attendees, as we as volunteers would be the first people to answer any participant questions regarding logistics.

Getting funding turned out a true challenge. In 2009, digital music consumption was not as large yet as it is today, and many potential data-providing partners were reluctant to collaborate. Of course, it also did not help my cause that I still was a complete nobody. Finally, when working on music, one faces an interesting paradox. On the one hand, many people, regardless of their backgrounds, identify with music, up to the point that they personally deeply care about it. As such, working on music makes for a good conversation starter, in which people are always happy to share their personal experiences. On the other hand, this makes music a commonplace topic, which risks it being shoved aside as ‘less serious’. Even though technically, the problems we are working on are framed in very similar ways as they may be in neighboring domains such as vision (and the research challenges are at least as hard, if not harder, due to subjective human factors being an integral part of the problem), common criticisms we receive are that music is fun but does not save lives, and does not deal with areas of major economical impact, nor easily measurable societal impact. So while we never have any problems legitimizing our work in public outreach, in grant-writing, we always need to justify extra why our work is more than a fun hobby, and sufficiently relevant to justify serious funding.

After several collaboration rejections, and the one proposal I did manage setting up getting rejected despite good review scores, I was very lucky that at the very end of my grant-writing year, I managed securing PhD funding through a Google Doctoral Fellowship (now PhD Fellowship). For this, I needed to get a research mentor, although my Google contacts weren’t so sure who would be appropriate for this role, as they were not aware of anyone working in music in the company at that stage.

Several weeks later, I was volunteering at ISMIR in Utrecht. That was where I found out that Douglas Eck had just moved from academia to industry, to work on music research at Google. And that was how I got my research mentor, with several extremely useful interning experiences at the company as a consequence.

When Emilia Gómez, the 2018-2019 president of the ISMIR society invited me to become general co-chair to ISMIR’s 20th anniversary edition with her, and host the event in Delft, this was my chance to give back. Now I had general chair powers, and as the society was quite open to discussing any innovations, I could try realizing the conference of my dreams.

As described in my previous column, the inclusive spirit of ISMIR has always been quite elaborate, including mentoring programs spearheaded by our Women in MIR movement, an explicit focus on multidisciplinarity over exclusivity, and on being medium-sized but single-track. Since two years, all our accepted papers are presented in a 4-minute presentation and a poster, such that all the works get equal visibility. This year, we chose to not do themed sessions but to randomize the paper order, such that authors on related topics would not be presenting their posters at the same time. As a side-effect, this also would nudge attendees towards learning about everything that got accepted, beyond the topics of their specializations. This is something I have seen the ISMIR community always being enthusiastic about, while I had very different experiences at (more prestigious) larger-sized conferences. In many cases, their larger size led to many parallel tracks with fragmented audiences, while any plenary program elements were so massive that it was hard to engage with anyone you did not happen to know already, or incidentally happened to stand or sit next to.

We made sure we offered more than paper presentations. For the keynotes, we invited speakers from neighboring fields and disciplines, and encouraged them to give some critical perspectives on our field. We engaged with a local school in an outreach program. Before the conference, we held workshops, including the Women in MIR prototyping workshop, so people would already get to know one another; we had a dedicated Newcomer Initiatives chair to make sure no one felt lost, and the socials were set up such that people could really mingle. With many people in music also happening to be active music players, we offered both formal and informal options to jam together, so that week, several cafes in Delft faced more live music than we would normally see.

But while I was preparing for this conference, one of my strongest experiences was that I kept being haunted by these memories of the past: that being able to join this community (and an academic career at all) had been a really close call, that really was catalyzed by me having been able to join the conferences, and having met supportive seniors, while I was still an early-stage student without a full research embedding.

So one of the ISMIR 2019 achievements I am most proud of, was that we extended our financial support programs, enabled by the ISMIR board and sponsorship funds. Beyond the existing grants for student authors and female participants, we added a third ‘community grant’ category, meant for individuals who would like to attend ISMIR, but who had not been in the capacity to actively participate to the conference at this stage. Reading through the motivation letters for this grant made me realize that my experiences not as much of a freak case, and that colleagues have been facing similar challenges.

I am deeply grateful that these grants enabled for us to get more people over to ISMIR. Young professionals in between positions, students in other disciplines seeking to collaborate more closely on music topics; students that have found themselves as sole people in their labs working on music, as the labs faced other strategic priorities; but also, seniors who used to be members of our field, but who had gradually been drifting out, when entering a vicious circle of not getting music projects funded, then having to do more teaching in other topics, and then taking hits on their research output and profile. It was a wonderful experience seeing all of them actively mingling with the community, and hearing how being at ISMIR indeed had been personally impactful for them.

For my student volunteers, I especially targeted local and national students who were not yet at the PhD level, such that they could experience our academic atmosphere. Here as well, I saw the positive impact of the ISMIR spirit; several of these students (of whom I am not even the thesis supervisor…) made friends with international colleagues, and are even trying to collaborate on music information research with them in their free time today.

Hopefully, this story can help inspiring colleagues who are seeking to make their conference cultures more inclusive and impactful. With this, I do want to add a warning that endeavors like this will not come for free, but demand considerable extra work and advocacy. Much of our proposed innovations initially faced pushback in some form, as these were not how things normally were done, and they required financial and human resources that would not be normally accounted for. But I am very grateful that we followed through, and extremely proud of what we achieved in the end. My great thanks go to the ISMIR society, my fellow ISMIR 2019 organizers and our sponsors for their trust and support.

All ISMIR 2019 presentations have been recorded, and are available through this link. The accepted (open access) papers with supplementary material are available via this page. Photos of the socials are available here.


About the Column

The Multidisciplinary Column is edited by Cynthia C. S. Liem and Jochen Huber. Every other edition, we will feature an interview with a researcher performing multidisciplinary work, or a column of our own hand. For this edition, we feature a column by Cynthia C. S. Liem.

Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. Her research interests consider search and recommendation for music and multimedia, with special interest in making people discover new interests, as well as questions of interpretability and validity. She initiated, co-coordinated and participated in various (inter)national collaborative research projects on the accessibility of content which would not trivially be retrieved, both in the music/cultural heritage world, as well as in social sciences applications, e.g. collaborating with organizational psychologists. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach. In 2018, she was Researcher-in-Residence at the National Library of The Netherlands, and in 2019, she served as general co-chair of the ISMIR conference.

Dr. Jochen Huber is a Senior User Experience Researcher at Synaptics. Previously, he was an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

Multidisciplinary Column: An Interview with Emilia Gómez

Could you tell us a bit about your background, and what the road to your current position was?

I have a technical background in engineering (telecommunication engineer specialized in signal processing, PhD in Computer Science), but I also followed formal musical studies at the conservatory since I was a child. So I think I have an interdisciplinary background.

Could you tell us a bit more about how you have encountered multidisciplinarity and interdisciplinarity both in your work on music information retrieval and your current project on human behavior and machine intelligence?

Music Information Retrieval (MIR) is itself a multidisciplinarity research area intended to help humans better make sense of this data. MIR draws from a diverse set of disciplines, including, but by no means limited to, music theory, computer science, psychology, neuroscience, library science, electrical engineering, and machine learning.

In my current project HUMAINT at the Joint Research Centre of the European Commission, we try to understand the impact that algorithms will have on humans, including our decision making and cognitive capabilities. This challenging topic can only be addressed in a holistic way and by incorporating insights from different disciplines. At our kick-off workshopwe gathered researchers working on distant fields, e.g. from computer science to philosophy, including law, neuroscience and psychology and we realised the need to engage on scientific discussions from different views and perspectives to address human challenges in a holistic way.

What have, in your personal experience, been the main advantages of multidisciplinarity and interdisciplinarity? Have you also encountered any disadvantages or obstacles?

The main advantage I see is the fact that we can combine distinct methodologies to generate new insights. For researchers, the fact of stepping out a discipline’s comfort zone makes us more creative and innovative.

One disadvantage is the fact that when you work on a multidisciplinary field you seem not to fit into traditional academic standards. In my case, I am perceived as a musician by engineers and as an engineer by musicians.

Beyond the academic community, your work also closely connects to interests by diverse types of stakeholders (e.g. industry, policy-makers). In your opinion, what are the most challenging aspects for an academic to operate in such a diverse stakeholder environment?

The most challenging part of diverse teams is communication, e.g. being able to speak the same language (we might need to create interdisciplinary glossaries!) and explain about our research in an accessible way so that it is understood by people with diverse backgrounds and expertises.

Regarding your work on music, you often have been speaking about making all music accessible to everyone. What do you consider the grand research challenges regarding this mission?

Many MIR researchers desire that technology can be used to make all music accessible to everyone, i.e. that our algorithms can help people discover new music, develop a varied musical taste and make them open to new music and, at the same time, to new ideas and cultures. We often talk of our desire that MIR algorithms help people discover music in the so called ´long tail`, i.e. music that is not so popular or present in the mainstream scenario. I believe the variety of music styles reflect the variety of human beings, e.g. in terms of culture, personalities and ideas. Through music we can then enrich our culture and understanding.

As the newly elected president of the ISMIR society, are there any specific missions regarding the community you would like to emphasize?

I have had the chance to work with an amazing ISMIR board over the last years, an incredible group of people willing to contribute to our community with their talent and time. With this team is very easy to work! 

This year, ISMIR is organizing its 19th edition (yes, we are getting old)! There are many challenges at ISMIR that we as a community should address, but at the moment I would like to emphasize some relevant aspects that are now somehow a priority for the board.

The first one is to maintain and expand its scientific excellence, as ISMIR should continue to provide key scientific advancements in our field. In this respect, we have recently launched our open access journal Transactions of ISMIR to foster the publication of more deep and mature research works in our area.

The second one is to promote variety in our community, e.g. in terms of discipline, gender or geographical location, also related to music culture and repertoire. In this respect, and thanks to our members, we have promoted ISMIR taking place at different locations, including editions in Asia (e.g. 2014 in Taipei, Taiwan, and 2017 in Suzhou, China).

Other aspects we put into value is reproducibility, openness and accessibility. In this sense, our priority is to maintain affordable registration rates, taking advantage of sponsorships from our industrial members, and devote our membership fees to provide travel funds for students or other members in need to attend ISMIR.

How and in what form do you feel we as academics can be most impactful?

The academic environment gives you a lot of flexibility and freedom to define research roadmaps, although there are always some dependencies on funding. In addition, academia provides time  to reflect and go deep into problems that are not directly related to a product in a short-term. In the technological field, academia has the potential to advance technologies by focusing on deeper understanding of why these technologies work well or not, e.g. through theoretical analysis or comprehensive evaluation

You also have been very engaged in missions surrounding Women in STEM, for example through the Women in MIR initiatives. In discussions on fostering diversity, the importance of role models is frequently mentioned. How can we be good role models?

Yes, I have become more and more concerned about the lack of opportunities that women have in our field with respect to their male colleagues. In this sense, Women in MIR is playing a major role in promoting the role and opportunities of women in our field, including a mentoring program, funding for women to attend ISMIR, and the creation of a public repository of female researchers to make them more visible and present.

I think women are already great role models in their different profiles, but they lack visibility with respect to their male colleagues.


Bios

Dr. Emilia Gómez graduated as a Telecommunication Engineer at Universidad de Sevilla and studied piano performance at the Seville Conservatoire of Music, Spain. She then received a DEA in Acoustics, Signal Processing and Computer Science applied to Music at IRCAM, Paris and a PhD in Computer Science at Universitat Pompeu Fabra in Barcelona (2006). She has been visiting researcher at the Royal Institute of Technology, Stockholm (Marie Curie Fellow, 2003), McGill University, Montreal (AGAUR competitive fellowship. 2010), and Queen Mary University of London (José de Castillejos competitive fellowship, 2015). After her PhD, she was first a lecturer in Sonology at the Higher School of Music of Catalonia and then joined the Music Technology Group, Department of Information and Communication Technologies,  Universitat Pompeu Fabra in Barcelona, Spain, first as an assistant professor and then as an associate professor (2011) and ICREA Academia fellow (2015). In 2017, she became the first female president of the International Society for Music Information Retrieval, and in January 2018, she joined the Joint Research Centre of the European Commission as Lead Scientist of the HUMAINT project, studying the impact of machine intelligence into human behavior.

Editor Biographies

Cynthia_Liem_2017Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.

 

 

jochen_huberDr. Jochen Huber is a Senior User Experience Researcher at Synaptics. Previously, he was an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

Multidisciplinary Column: Inclusion at conferences, my ISMIR experiences

In 2009, I attended my very first international conference. At that time, I recently had graduated for my Master’s degree in Computer Science, and just was starting the road towards a PhD; in parallel, I had also started pursuing my Master’s degree in Piano Performance at the conservatoire. As a computer scientist, I had conducted my MSc thesis project on cover song retrieval, which had resulted in an accepted paper at ISMIR, the yearly conference of the International Society of Music Information Retrieval.

That something like ‘Music Information Retrieval’ (Music-IR) existed, in which people performed computer science research in the music domain, fascinated me deeply. While I was training to become both a musician and a computer scientist, up to that point, I mostly had been encouraged to keep these two worlds as segregated as possible. As a music student, I would be expected to be completely and exclusively committed to my instrument; I often felt like a cheater when I was working on my computer science assignments. As a computer scientist, many of my music interests would be considered to be on the ‘artistic’, ‘subjective’ or even ‘fluffy’ side; totally fine if that was something I wanted to spend my hobby time on, but seriously integrating this with cold, hard computer science techniques seemed quite unthinkable.

Rather than having gone to a dedicated Music-IR group, I had remained at Delft University of Technology for my education, seeing parallels between the type of Multimedia Computing research done in the group of Alan Hanjalic, and problems I wanted to tackle in the music domain. However, that did mean I was the only one working on music there, and thus, that I was going to travel on my own to this conference…to Kobe, Japan, literally on the other end of the globe.

On the first day, I felt as impressed as I felt intimidated and lonely. All those people whose work I had read for years now became actual human beings I could talk to. Yet, I would not quite dare walking up to them myself…surely, they would have more interesting topics to discuss with more interesting people than me!

However, I was so lucky to get ‘adopted’ by Frans Wiering from Utrecht University, a well-known senior member of the community, who knew me from The Netherlands, as I had attended a seminar surrounding the thesis defense of one of his PhD students in the past. Before I got the chance to silently vanish into a corner of the reception room, he started proactively introducing me to the many people he was talking to himself. In the next days, I naturally started talking to these people as a consequence, and became increasingly confident in initiating new contacts myself.

With ISMIR being a single-track conference, I got the chance to soak up a very diverse body of work, presented by a very diverse body of researchers, with backgrounds ranging from machine learning to musicology. At one point, there was a poster session in which I discussed a signal processing algorithm with one of the presenters, turned around, literally remaining at the same physical location, and then discussed historical music performance practice with the opposite presenter. At this venue, the two parts of my identity which I so far had largely kept apart, turned out to actually work out very well together.

I attended many ISMIRs since, and time and time again, I kept seeing confirmations that a diversity of backgrounds, within attendees and between attendees, was what made the conference strong and inspiring. Whether we identify as researchers in signal processing, machine learning, library sciences, musicology, or psychology, what connects us all is that we look at music (and personally care about music), which we validly can do in parallel, each from our respective dedicated specialisms.

We do not always speak the same professional language, and we may validate in different ways. It requires effort to understand one another, more so than if we would only speak to people within our own niche specializations. But there is a clear willingness to build those bridges, and learn from one another. As one example, this year at ISMIR 2017, I was invited on a panel on the Future of Music-IR research, and each of the panelists was asked what works or research directions outside of the Music-IR community we would recommend for the community to familiarize with. I strongly believe that discussions like this, aiming to expand our horizons, are what we need at conferences…and what truly legitimizes us traveling internationally to exchange academic thoughts with our peers in person.

I also have always found the community extremely supportive in terms of reviewing. Even in case of rejections, one would usually receive a constructive review back, with multiple concrete pointers for improvements. Thanks to proactive TPC member actions and extensive reviewer guidelines with examples, the average review length for papers submitted to the ISMIR conference went up from 390 words in 2016 to 448 words in 2017.

As this was the baseline I was originally used to, my surprise was great when I first got confronted with the feared ‘two-line review’…as sadly turned out, that actually turned out the more common type of review in research at large. We recently have been discussing this within the SIGMM community, and in those discussions, more extensive reviewer guidelines seemed to be considered a case of ‘TL;DR’ (‘reviewers are busy enough, they won’t have time to read that’). But this is a matter of how we want our academic culture to be. Of course, a thorough and constructive review needs more time commitment than a two-line review, and this may become a problem in situations of high reviewer load. But rather than silently trying to hack the problem as individual reviewers (with more mediocre attention as likely consequence), maybe we should be more consciously selective of what we can handle, and openly discuss it with the community in case we run into capacity issues.

Back to the ISMIR community, more institutionally, inclusion has become a main focus point now. In terms of gender inclusion, a strong Women in MIR (WiMIR) group emerged in the past years, enabling an active mentoring program, and arranging for travel grant sponsoring to support conference attendance of female researchers. But impact reaches beyond gender inclusion. WiMIR also introduced a human bingo at its receptions, for which conference attendees with various characteristics (e.g. ‘has two degrees’, ‘attended the conference more than five times’, ‘is based in Asia’) need to be identified. A very nice and effective way to trigger ice-breaking activities, and to have attendees actively seeking out people they did not speak with yet. That the responsibility to get included at events should not only fall upon new members, but actively should be championed by the existing ‘insiders’, also recently was emphasized in this great post by Eric Holscher.

So, is ISMIR the perfect academic utopia? No, of course we do have our issues. As a medium-sized community, fostering cross-domain interaction goes well, but having individual specializations gain sufficient momentum needs an explicit outlook beyond our own platform. And we also have some status issues. Our conference, being run by an independent society, is frequently omitted from conference rankings; however, the independence is on purpose, as this will better foster accessibility of the venue towards other disciplines. And with an average acceptance rate around 40%, we often are deemed as ‘not sufficiently selective’…but in my experience, there usually is a narrow band of clear accepts, a narrow band of clear rejects, and a broad grey-zone band in the middle. And in more selective conferences, the clear rejects typically have a larger volume, and are much worse in quality, than the worst submissions I have ever seen at ISMIR.

In any case, given the ongoing discussions about SIGMM conferences, multidisciplinarity and inclusion, I felt that sharing some thoughts and observations from this neighboring community would be useful.

And…I really look forward already to serving as a general co-chair of ISMIR’s 20th anniversary in 2019—which will be exactly 10 years after my first, shy debut in the field.


About the Column

The Multidisciplinary Column is edited by Cynthia C. S. Liem and Jochen Huber. Every other edition, we will feature an interview with a researcher performing multidisciplinary work, or a column of our own hand. For this edition, we feature a column by Cynthia C. S. Liem.

Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.

Dr. Jochen Huber is a Senior User Experience Researcher at Synaptics. Previously, he was an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com