Multidisciplinary Column: Lessons Learned from a Multidisciplinary Hands-on Course on Interfaces for Inclusive Music Making

This short article reports on lessons learned from a multidisciplinary hands-on course that I co-taught in the academic winter term 2021/2022. Over the course of the term, I co-advised a group of 4 students who explored designing interfaces for Musiklusion [1], a project focused on inclusive music making using digital tools. Inclusive participation in music making processes is a topic home to the Multimedia community, as well as many neighbouring disciplines (see e.g. [2,3]). In the following, I briefly detail the curriculum, describe project Musiklusion, outline challenges and report on the course outcome. I conclude by summarizing a set of personal observations from the course—albeit anecdotal—that could be helpful for fellow teachers who wish to design a hands-on course with inclusive design sessions.

When I rejoined academia in 2020, I got the unique possibility to take part in teaching activities pertaining to, i.a., human-centered multimedia within a master’s curriculum on Human Factors at Furtwangen University. Within this 2-year master’s programme, one of the major mandatory courses is a 4-month hands-on course on Human Factors Design. I co-teach this course jointly with 3 other colleagues from my department. We expose students to multi-disciplinary research questions which they must investigate empirically in groups of 4-6. They have to come up with tangible results, e.g. a prototype or qualitative and quantitative data as empirical evidence.

Last term, each of us docents advised one group of students. Each group was also assigned an external partner to help ground the work and embed it into a real-world use case. The group of students I had the pleasure to work with partnered with Musiklusion’s project team. Musiklusion is an inclusive project focused on accessible music making with digital tools for people with so-called disabilities. They work and make music alongside people without any disabilities. These disabilities pertain e.g. to cognitive disabilities and impairments of motor skills with conditions continuing to progress. Movement, gestures and, eventually tasks, that can be performed today (e.g. being able to move one’s upper body) cannot be taken for granted in the future. Thus, as an overarching research agenda for the course project, the group of students explored the design and implementation of digital interfaces that enable people with cognitive and/or motor impairments to actively participate in music making processes and possibly sustain their participation in the long run depending on their physical abilities.

Figure 1. Current line-up of instruments of Project Musiklusion (source: Musiklusion feature with Tabea Booz & Sharon)

Project Musiklusion is spearheaded by musician and designer Andreas Brand [4], partnering with Lebenshilfe Tuttlingen [5]. The German Lebenshilfe is a nation-wide charitable association for people with so-called disabilities. Musiklusion’s project team makes two salient contributions: (i) orchestrating off-the-shelf instruments such that they are “programmable” and (ii) designing, developing and implementing digital interfaces that enable people with so-called disabilities to make music using said instruments. The project’s current line-up of instruments (cf. Figure 1) comprises a Disklavier with a Midi port and an enhanced drum set with drivers and mechanical actuators [6]. Both instruments can be controlled using MAX/MSP through OSC. Hence tools like TouchOSC [7] can be leveraged to design 2D widget-based graphical user interfaces to control each instrument. While a musician with impaired motor skills in the upper body might not be able to play individual notes using a touch interface or the actual Disklavier for instance, digital interfaces and widgets can be used to vary e.g. pitch or pace of themes.

With sustainable use of the above instruments in mind, the group of students aimed to explore alternative input modalities that could be used redundantly depending on a musician’s motor skills. They conducted weekly sessions with project members of Musiklusion over the course of about 2.5 months. Most of the project members use a motorized wheelchair and have limited upper body movement. Each session ran from 1 to 3 hours, depending on availability of project members and typically 2-5 members were present. The sessions took place at Lebenshilfe Tuttlingen, where the instruments were based at and used on daily basis. Based on in-situ observations and conversations, the group of students derived requirements and user needs to inform interface designs. They also led weekly co-design sessions where they prototyped both interfaces and interactions and tried them out with project members, respectively. Reporting on the actual iterative design sessions, the employed methodology (cf. [8,9]), as well as data gathered is beyond this short article and should be presented at a proper venue focusing on human-centred multimedia. Yet, to provide a glimpse on to the results: the group of students came up with a set of 4 different interfaces that cater to individual abilities and can be used redundantly with both the Disklavier and the drum kit. They designed (a) body-based interactions that can be employed while sitting in a motorized wheelchair, (b) motion-based interactions that leverage accelerometer and gyroscope data of e.g. a mobile phone held in hand or strapped to an upper arm, (c) an interface that leverages face mimics, relying on face tracking and (d) an eye-tracking interface that leverages eye movement for interaction. At the end of the course, and amidst the corona pandemic, these interfaces were used to enable the Musiklusion project members to team up with artists and singers Tabea Booz and Sharon to produce a music video remotely. The music video is available at https://www.youtube.com/watch?v=RYaTEYiaSDo and showcases the interfaces in actual productive use.

In the following, I enumerate personal lessons learned as an advisor and course instructor. Although these observations only steam from a single term and single group of students, I still find them worthwhile to share with the community.

  • Grounding of course topic is key. Teaming up with an external partner who provides a real-world use case had a tremendous impact on how the project went. The course could have also taken place without involving Musiklusion’s project members and actual instruments—designs and implementations would then have suffered from a low external validity. Furthermore, this would have rendered conduction of co-design sessions impossible.
  • Project work must be meaningful and possibly impactful. The real-world grounding of the project work and therefore also pressure to deliver progress to Musiklusion’s project members kept students extrinsically motivated. However, I observed students being engaged on a very high level and going above and beyond to deliver constantly improved prototypes. From conversations I had, I felt that both meaningfulness of their work and the impact they had motivated them intrinsically.
  • Course specifications should be tailored towards interests to acquire skills of course members. It might seem obvious (cf. [10]), but this course made me again realize how important it is to cater to the interest of students in acquiring new skills and match their interest to course specifications (cite Teaching college). The outcome of this project would have been entirely different, if students were not interested in learning how to build, deliver and test-drive prototypes iteratively at a high pace. This certainly also served as an additional intrinsic motivation.

In conclusion, teaching this course was a unique experience for me, as well as for the student members involved in the course work. It was certainly not my first hands-on course that I had taught. Also, hands-on course work is home to many HCI curricula across the globe. But I hope that this anecdotal report further inspires fellow teachers to partner with (charitable) organizations to co-teach modules and have them sponsor real-world use cases that motivate students both extrinsically and intrinsically.

Acknowledgements

I want to extend special thanks to participating students Selina Layer, Laura Moosmann, Marvin Shopp and Tobias Wirth, as well as Andreas Brand, Musiklusion project members and Lebenshilfe Tuttlingen.

References

[1] Musiklusion Project Webpage. https://www.musiklusion.de. Last accessed: June 28, 2022.

[2] Hornof A, Sato L. (2004). EyeMusic: making music with the eyes. In: Proceedings of the 2004 conference on New interfaces for musical expression, pp 185–188.

[3] Petry, B., Illandara, T., & Nanayakkara, S. (2016, November). MuSS-bits: sensor-display blocks for deaf people to explore musical sounds. In Proceedings of the 28th Australian Conference on Computer-Human Interaction(pp. 72-80).

[4] Personal webpage of Andreas Brand. https://andybrand.de. Last accessed: June 28, 2022.

[5] Lebenshilfe Tuttlingen. https://lebenshilfe-tuttlingen.de. Last accessed: June 28, 2022.

[6] Musiklusion Drum Set. https://www.musiklusion.de/musiklusion-schlagzeug/. Last accessed: June 28, 2022.

[7] TouchOSC. https://hexler.net/touchosc. Last accessed: June 28, 2022.

[8] Veytizou J, Magnier C, Villeneuve F, Thomann G. (2012). Integrating the human factors characterization of disabled users in a design method. Application to an interface for playing acoustic music. Association for the Advancement of Modelling and Simulation Techniques in Enterprises 73:173.

[9] Gehlhaar R, Rodrigues PM, Girão LM, Penha R. (2014). Instruments for everyone: Designing new means of musical expression for disabled creators. In: Technologies of inclusive well-being. Springer, pp 167–196.

[10] Eng, N. (2017). Teaching college: The ultimate guide to lecturing, presenting, and engaging students.


About the Column

The Multidisciplinary Column is edited by Cynthia C. S. Liem and Jochen Huber. Every other edition, we will feature an interview with a researcher performing multidisciplinary work, or a column of our own hand. For this edition, we feature a column by Jochen Huber.

Editor Biographies

Cynthia_Liem_2017Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.

jochen_huberDr. Jochen Huber is Professor of Computer Science at Furtwangen University, Germany. Previously, he was a Senior User Experience Researcher with Synaptics and an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

Two Interviews with renown Datasets Researchers

This issue of the Dataset Column provides two interviews with the researchers responsible for novel datasets of recent years. In particular, we first interview Nacho Reimat (https://www.cwi.nl/people/nacho-reimat), the scientific programmer responsible for the CWIPC-SXR, one of the first datasets on dynamic, interactive volumetric media. Second, we interview Pierre-Etienne Martin (https://www.eva.mpg.de/comparative-cultural-psychology/staff/pierre-etienne-martin/), responsible for contributions to datasets in the area of sports and culture.  

The two interviewees were asked about their contribution to the dataset research, their interests, challenges, and the future.  We would like to thank both Nacho and Pierre-Etienne for their agreement to contribute to our column. 

Nacho Reimat, Scientific Programmer at the Distributed and Interactive Systems group at the CWI, Amsterdam, The Netherlands

Short bio: Ignacio Reimat is currently an R&D Engineer at Centrum Wiskunde & Informatica (CWI) in Amsterdam. He received the B.S. degree in Audiovisual Systems Engineering of Telecommunications at Universitat Politecnica de Catalunya in 2016 and the M.S degree in Innovation and Research in Informatics – Computer Graphics and Virtual Reality at Universitat Politecnica de Catalunya in 2020. His current research interests are 3D graphics, volumetric capturing, 3d reconstruction, point clouds, social Virtual Reality and real-time communications.

Could you provide a small summary of your contribution to the dataset research?

We have released the CWI Point Cloud Social XR Dataset [1], a dynamic point cloud dataset that depicts humans interacting in social XR settings. In particular, using commodity hardware we captured audio-visual data (RGB + Depth + Infrared + synchronized Audio) for a total of 45 unique sequences of people performing scripted actions [2]. The screenplays for the human actors were devised so as to simulate a variety of common use cases in social XR, namely, (i) Education and training, (ii) Healthcare, (iii) communication and social interaction, and (iv) Performance and sports. Moreover, diversity in gender, age, ethnicities, materials, textures and colours were additionally considered. As part of our release, we provide annotated raw material, resulting point cloud sequences, and an auxiliary software toolbox to acquire, process, encode, and visualize data, suitable for real-time applications.

Sample frames from the point cloud sequences released with the CWIPC-SXR dataset.

Why did you get interested in datasets research?

Real-time, immersive telecommunication systems are quickly becoming a reality, thanks to the advances in the acquisition, transmission, and rendering technologies. Point clouds in particular serve as a promising representation in these types of systems, offering photorealistic rendering capabilities with low complexity. Further development of transmission, coding, and quality evaluation algorithms, though, is currently hindered by the lack of publicly available datasets that represent realistic scenarios of remote communication between people in real-time. So we are trying to fill this gap. 

What is the most challenging aspect of datasets research?

In our case, because point clouds are a relatively new format, the most challenging part has been developing the technology to generate them. Our dataset is generated from several cameras, which need to be calibrated and synchronized in order to merge the views successfully. Apart from that, if you are releasing a large dataset, you also need to deal with other challenges like data hosting and maintenance, but even more important, find the way to distribute the data in a way that is suitable for different target users. Because we are not releasing just point clouds but also the raw data, there may be people interested in the raw videos, or in particular point clouds, and they do not want to download the full 1.6TB of data. And going even further, because of the novelty of the point cloud format, there is also a lack of tools to re-capture, playback or modify this type of data. That’s why, together with the dataset, we also released our point cloud auxiliary toolbox of software utilities built on top of the Point Cloud Library, which allows for alignment and processing of point clouds, as well as real-time capturing, encoding, transmission, and rendering.

How do you see the future of datasets research?

Open datasets are an essential part of science since they allow for comparison and reproducibility. The major problem is that creating datasets is difficult and expensive, requiring a big investment from research groups. In order to ensure that relevant datasets keep on being created, we need a push including: scientific venues for the publication and discussion of datasets (like the dataset track at the Multimedia Systems conference, which started more than a decade ago), investment from funding agencies and organizations identifying the datasets that the community will need in the future, and collaboration between labs to share the effort.

What are your future plans for your research?

We are very happy with the first version of the dataset since it provides a good starting point and was a source of learning. Still, there is room for improvements, so now that we have a full capturing system (together with the auxiliary tools), we would like to extend the dataset and refine the tools. The community still needs more datasets of volumetric video to further advance the research on alignment, post-processing, compression, delivery, and rendering. Apart from the dataset, the Distributed and Interactive Systems (https://www.dis.cwi.nl) group from CWI is working on volumetric video conferencing, developing a Social VR pipeline for enabling users to more naturally communicate and interact. Recently, we deployed a solution for visiting museums remotely together with friends and family members (https://youtu.be/zzB7B6EAU9c), and next October we will start two EU-funded projects on this topic.   


Pierre-Etienne Martin, Postdoctoral Researcher & Tech Development Coordinator, Max Planck Institute for Evolutionary Anthropology, Department of Comparative Cultural Psychology, Leipzig, Germany

Short Bio: Pierre-Etienne Martin is currently a Postdoctoral researcher at the Max Planck Institute. He received his M.S. degree in 2017 from the University of Bordeaux, the Pázmány Péter Catholic University and the Autonomous University of Madrid via the Image Processing and Computer vision Erasmus Master program. He obtained his PhD, labelled European, from the University of Bordeaux in 2020, supervised by Jenny Benois-Pineau and Renaud Péteri, on the topic of video detection and classification by means of Convolutional Neural Networks. His current research interests include among others Artificial Intelligence, Machine Learning and Computer Vision.

Could you provide a small summary of your contribution to the dataset research?

In 2017, I started my PhD thesis which focuses on movement analysis in sports. The aim of this research project, so-called CRIPS (ComputeR vIsion for Sports Performance – see ), is to improve the training experience of the athletes. Our team decided to focus on Table Tennis, and it is with the collaboration of the Sports Faculty of the University of Bordeaux, STAPS, that our first contribution came to be: the TTStroke-21 dataset [3]. This dataset gathers recordings of table tennis games at high resolution and 120 frames per second. The players and annotators are both from the STAPS. The annotation platform was designed by students from the LaBRI – University of Bordeaux, and the MIA from the University of la Rochelle. Coordination for recording the videos and doing the annotation was performed by my supervisors and myself.

In 2019, and until now, the TTStroke-21 is used to propose the Sports Task at the Multimedia Evaluation benchmark – MediaEval [4]. The goal is to segment and classify table tennis strokes from videos.

TTStrokes-21 sample images

Since 2021, I have joined the MPI EVA institute and I now focus on elaborating datasets for the Comparative Cultural Psychology department (CCP). The data we are working on focuses on great apes and children. We aim at segmenting, identifying and tracking. 

Why did you get interested in datasets research?

Datasets research is the field where the application of computer vision tools is possible. In order to widen the range of applications, datasets with qualitative ground truth need to be offered by the scientific community. Only then, models can be developed to solve the problem raised by the dataset and finally be offered to the community. This has been the goal of the interdisciplinary CRISP project, through the collaboration of the sport and computer science community, for improving athlete performance.

It is also the aim of collaborative projects, such as MMLAB [5], which gathers many models and implementations trained on various datasets, in order to ease reproducibility, performance comparison and inference for applications.

What is the most challenging aspect of datasets research?

From my experience, when organizing the Sport task at the MediaEval workshop, the most challenging aspect of datasets research is to be able to provide qualitative data: from acquisition to annotation; and tools to process them: use, demonstration and evaluation. That is why, on the side of our task, we also provide a baseline which covers most of these aspects.

How do you see the future of datasets research?

I hope datasets research will transcend in order to have a general scheme for annotation and evaluation of datasets. I hope the different datasets could be used together for training multi-task models, and give the opportunity to share knowledge and features proper to each type of dataset. Finally, quantity has been a major criterion for dataset research, but quality should be more considered in order to improve state-of-the-art performance while keeping a sustainable way to conduct research.

What are your future plans for your research?

Within the CCP department at MPI, I hope to be able to build different types of datasets to put to best use what has been implemented in the computer vision field to psychology.

Relevant references:

  1. CWIPC-SXR dataset: https://www.dis.cwi.nl/cwipc-sxr-dataset/
  2. I. Reimat, et al., “CWIPC-SXR: Point Cloud dynamic human dataset for Social XR. In Proceedings of the 12th ACM Multimedia Systems Conference (MMSys ’21). Association for Computing Machinery, New York, NY, USA, 300–306. https://doi.org/10.1145/3458305.3478452
  3. TTStroke-21: https://link.springer.com/article/10.1007/s11042-020-08917-3
  4. Media-Eval: http://www.multimediaeval.org/
  5. Open-MMLab: https://openmmlab.com/

ACM SIGMM Executive Committee Newsletter – 1, 2022


The Special Interest Group in Multimedia of ACM, ACM SIGMM, provides a forum for researchers, engineers, and practitioners in all aspects of multimedia computing, communication, storage, and applications. We do this through our sponsorship and organization of conferences and workshops, supporting student travel to such events, discounted registrations, two regional chapters, recognition of excellence and achievement through an awards scheme, and we inform the Multimedia community of our activities through the SIGMM Records, social media and through mailing lists. Information on joining SIGMM can be found at https://www.acm.org/special-interest-groups/sigs/sigmm.

The SIGMM Executive Committee Newsletter in SIGMM Records periodically reports on the topics discussed and the decisions assumed in the Executive Committee meetings to improve transparency and sense of community. 

SIGMM Executive Committee Meeting 2022-03-16

Attended: Alberto Del Bimbo (Chair); Phoebe Chen (Vice-Chair); Miriam Redi (Conference Director); Changsheng Xu, Ketan Mayer-Patel, Kiyoharu Aizawa, Pablo Cesar, Prabhakaran, Balakrishnan, Qi Tian, Susanne Boll, Tao Mei, Abdulmotaleb El Saddik, Alan Smeaton (SIGMM Executive Committee members); Xavier Alameda Pineda (Invited guest)

Sent justification and comments: Lexing Xie (SIGMM Executive Committee member). 

We discussed the 2022 SIGMM budget. The SIGMM budget is in a good shape, and we foresee room for new initiatives to strengthen and expand the SIGMM community and improve our communication via existing and new channels.  

We approved a revision of SIGMM bylaws (proposed by Susanne Boll) to improve diversity: the chair and vice-chair will run for the offices in pairs; a way to encourage diversity without necessarily having to put quota. The proposal has been sent to ACM for approval. 

We approved three proposals for special initiatives that will improve inclusion. In late 2021, the SIGMM Executive invited SIGMM Members to apply for funding for new initiatives building on SIGMM’s excellence and strengths, nurturing new talent in the SIGMM community, and addressing weaknesses in the SIGMM community and in SIGMM activities. The fund can support auditable expenses incurred and necessary for the completion of the initiative. The proposals received were evaluated based on impact and contribution to the SIGMM community, and cost-effectiveness of the proposed budget. The three special initiatives approved so far are:

  • Multi-City PhD-School (proposed by the Steering Committee Co-Chairs of MM Asia)
    This is a two-half day program which is planned to be implemented in ACM MM Asia and eventually applied to other conferences in the future. The program is hosted in 3-5 satellite sites located in different Asian cities. Each site will physically gather 30-50 PhD students plus 1-2 senior researchers in a local venue. Different sites are virtually connected by online meetings. Invited student speakers will deliver a 3-minute lightning talk in turn followed by QA talks with mentors. The program allows students to physically attend the event, talk to senior researchers, while increasing the impact of satellite events among young researchers. Students are encouraged to register for the satellite events and attend virtually. This could involve more students and minority attendees with satellite events bringing students from multiple cities for idea exchange and research training
  • MMSys inclusion initiative (proposed by the MMSys’22 General Chairs & Diversity Chairs)
    The goal of this initiative is to improve diversity and inclusion in the MMSys community. The proposal includes 1) Travel support for non-student participants who self-identify as marginalized and/or underrepresented, lacking other funding opportunities; 2) an EDI (Equality, Diversity and Inclusion) panel aiming at increasing visibility and recognition of minorities and under-represented researchers in SIGMM fields, stimulating new collaborations; and promoting networking and mentoring between junior and senior researchers.
  • IMX Inclusion initiative (proposed by the IMX’22 Diversity Chairs)
    The goal of this initiative is to promote the participation of groups of students and researchers that have historically been underrepresented in the IMX’s community. The proposal includes funding for 1) a panel discussion on diversity in the metaverse; 2) travel support for individuals who self-identify as marginalized and/or underrepresented in terms of gender, race, and geographical location and who lack the financial resources to attend an international conference.

The SIGMM Executive also discussed two other initiatives, namely the opportunity of using Open Review in the SIGMM flagship conference ACM Multimedia (this year it is adopted on an experimental basis in ACMMM 2022), and the project of a reproducibility platform for open streaming evaluation and benchmarking (proposed by Ali Begen) eventually extendible beyond streaming media.  They both will be further discussed and evaluated in the next future.

The Chairs of the SIGMM Executive Committee

JPEG Column: 95th JPEG Meeting

JPEG issues a call for proposals for JPEG Fake Media

The 95th JPEG meeting was held online from 25 to 29 April 2022. A Call for Proposals (CfP) was issued for JPEG Fake Media that aims at a standardisation framework for secure annotation of modifications in media assets. With this new initiative, JPEG endeavours to provide standardised means for the identification of the provenance of media assets that include imaging information. Assuring the provenance of the coded information is essential considering the current trends and possibilities on multimedia technology.

Fake Media standardisation aims the identification of image provenance.

This new initiative complements the ongoing standardisation of machine learning based codecs for images and point clouds. Both are expected to revolutionise the state of the art of coding standards, leading to compression rates beyond the current state of the art.

The 95th JPEG meeting had the following highlights:

  • JPEG Fake Media issues a Call for Proposals;
  • JPEG AI
  • JPEG Pleno Point Cloud Coding;
  • JPEG Pleno Light Fields quality assessment;
  • JPEG AIC near perceptual lossless quality assessment;
  • JPEG NFT exploration;
  • JPEG DNA explorations
  • JPEG XS 2nd edition published;
  • JPEG XL 2nd edition.

The following summarises the major achievements of the 95th JPEG meeting.

JPEG Fake Media

At its 95th JPEG meeting, the committee issued a Final Call for Proposals (CfP) on JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall address use cases that are in good faith as well as those with malicious intent. The call for proposals welcomes contributions that address at least one of the extensive list of requirements specified in the associated “Use Cases and Requirements for JPEG Fake Media” document. Proponents are highly encouraged to express their interest in submission of a proposal before 20 July 2022 and submit their final proposal before 19 October 2022. Full details about the timeline, submission requirements and evaluation processes are documented in the CfP available on jpeg.org.

JPEG AI

Following the JPEG AI joint ISO/IEC/ITU-T Call for Proposals issued after the 94th JPEG committee meeting, 14 registrations were received among which 12 codecs were submitted for the standard reconstruction task. For computer vision and image processing tasks, several teams have submitted compressed domain decoders, notably 6 for image classification. Prior to the 95th JPEG meeting, the work was focused on the management of the Call for Proposals submissions and the creation of the test sets and the generation of anchors for standard reconstruction, image processing and computer vision tasks. Moreover, a dry run of the subjective evaluation of the JPEG AI anchors was performed with expert subjects and the results were analysed during this meeting, followed by additions and corrections to the JPEG AI Common Training and Test Conditions and the definition of several recommendations for the evaluation of the proposals, notably, the anchors, images and bitrates selection. A procedure for cross-check evaluation was also discussed and approved. The work will now focus on the evaluation of the Call for Proposals submissions, which is expected to be finalized at the 96th JPEG meeting.

JPEG Pleno Point Cloud Coding

JPEG Pleno is working towards the integration of various modalities of plenoptic content under a single and seamless framework. Efficient and powerful point cloud representation is a key feature within this vision. Point cloud data supports a wide range of applications for human and machine consumption including metaverse, autonomous driving, computer-aided manufacturing, entertainment, cultural heritage preservation, scientific research and advanced sensing and analysis. During the 95th JPEG meeting, the JPEG Committee reviewed the responses to the Final Call for Proposals on JPEG Pleno Point Cloud Coding. Four responses have been received from three different institutions. At the upcoming 96th JPEG meeting, the responses to the Call for Proposals will be evaluated with a subjective quality evaluation and objective metric calculations.

JPEG Pleno Light Field

The JPEG Pleno standard tools provide a framework for coding new imaging modalities derived from representations inspired by the plenoptic function. The image modalities addressed by the current standardization activities are light field, holography, and point clouds, where these image modalities describe different sampled representations of the plenoptic function. Therefore, to properly assess the quality of these plenoptic modalities, specific subjective and objective quality assessment methods need to be designed.

In this context, JPEG has launched a new standardisation effort known as JPEG Pleno Quality Assessment. It aims at providing a quality assessment standard, defining a framework that includes subjective quality assessment protocols and objective quality assessment procedures for lossy decoded data of plenoptic modalities for multiple use cases and requirements. The first phase of this effort will address the light field modality.

To assist this task, JPEG has issued the “JPEG Pleno Draft Call for Contributions on Light Field Subjective Quality Assessment”, to collect new procedures and best practices with regard to light field subjective quality assessment methodologies to assess artefacts induced by coding algorithms. All contributions, which can be test procedures, datasets, and any additional information, will be considered to develop the standard by consensus among the JPEG experts following a collaborative process approach.

The Final Call for Contributions will be issued at the 96th JPEG meeting. The deadline for submission of contributions is 18 December 2022.

JPEG AIC

During the 95th JPEG Meeting, the committee released the Draft Call for Contributions on Subjective Image Quality Assessment.

The new JPEG AIC standard will be developed considering all the submissions to the Call for Contributions in a collaborative process. The deadline for the submission is set for 14 October 2022. Multiple types of contributions are accepted, notably subjective assessment methods including supporting evidence and detailed description, test material, interchange format, software implementation, criteria and protocols for evaluation, additional relevant use cases and requirements, and any relevant evidence or literature.

The JPEG AIC committee has also started the preparation of a workshop on subjective assessment methods for the investigated quality range, which will be held at the end of June. The workshop targets obtaining different views on the problem, and will include both internal and external speakers, as well as a Q&A panel. Experts in the field of quality assessment and stakeholders interested in the use cases are invited.

JPEG NFT

After the joint JPEG NFT and Fake Media workshops it became evident that even though the use cases between both topics are different, there is a significant overlap in terms of requirements and relevant solutions. For that reason, it was decided to create a single AHG that covers both JPEG NFT and JPEG Fake Media explorations. The newly established AHG JPEG Fake Media and NFT will use the JPEG Fake Media mailing list.

JPEG DNA

The JPEG Committee has continued its exploration of the coding of images in quaternary representations, as it is particularly suitable for DNA storage applications. The scope of JPEG DNA is the creation of a standard for efficient coding of images that considers biochemical constraints and offers robustness to noise introduced by the different stages of the storage process that is based on DNA synthetic polymers. A new version of the overview document on DNA-based Media Storage: State-of-the-Art, Challenges, Use Cases and Requirements was issued and has been made publicly available. It was decided to continue this exploration by validating and extending the JPEG DNA benchmark codec to simulate an end-to-end image storage pipeline using DNA for future exploration experiments including biochemical noise simulation. During the 95th JPEG meeting, a new specific document describing the Use Cases and Requirements for DNA-based Media Storage was created which is made publicly available. A timeline for the standardization process was also defined. Interested parties are invited to consider joining the effort by registering to the JPEG DNA AHG mailing list.

JPEG XS

The JPEG Committee is pleased to announce that the 2nd editions of Part 1 (Core coding system), Part 2 (Profiles and buffer models), and Part 3 (Transport and container formats) were published in March 2022. Furthermore, the committee finalized the work on Part 4 (Conformance testing) and Part 5 (Reference software), which are now entering the final phase for publication. With these last two parts, the committee’s work on the 2nd edition of the JPEG XS standards comes to an end, allowing to shift the focus to further improve the standard. Meanwhile, in response to the latest Use Cases and Requirements for JPEG XS v3.1, the committee received a number of technology proposals from Fraunhofer and intoPIX that focus on improving the compression performance for desktop content sequences. The proposals will now be evaluated and thoroughly tested and will form the foundation of the work towards a 3rd edition of the JPEG XS suite of standards. The primary goal of the 3rd edition is to deliver the same image quality as the 2nd edition, but with half of the required bandwidth.

JPEG XL

The second edition of JPEG XL Part 1 (Core coding system), with an improved numerical stability of the edge-preserving filter and numerous editorial improvements, has proceeded to the CD stage. Work on a second edition of Part 2 (File format) was initiated. Hardware coding was also further investigated. Preliminary software support has been implemented in major web browsers, image viewing and editing software, including popular tools such as FFmpeg, ImageMagick, libvips, GIMP, GDK and Qt. JPEG XL is now ready for wide-scale adoption.

Final Quote

“Recent development on creation and modification of visual information call for development of tools that can help protecting the authenticity and integrity of media assets. JPEG Fake Media is a standardised framework to deal with imaging provenance.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

Upcoming JPEG meetings are planned as follows:

  • No. 96, will be held online during 25-29 July 2022.