
Immersive imaging technologies offer a transformative way to change how we experience interacting with remote environments, i.e., telepresence. By leveraging advancements in light field imaging, omnidirectional cameras, and head-mounted displays, these systems enable realistic, real-time visual experiences that can revolutionize how we interact with the remote scene in fields such as healthcare, education, remote collaboration, and entertainment. However, the field faces significant technical and experiential challenges, including efficient data capture and compression, real-time rendering, and quality of experience (QoE) assessment. Expanding on the findings of the authors’ recent publication and situating them within a broader theoretical framework, this article provides an integrated overview of immersive telepresence technologies, focusing on their technological foundations, applications, and the challenges that must be addressed to advance this field.
1. Redefining Telepresence Through Immersive Imaging
Telepresence is defined as the “sense of being physically present at a remote location through interaction with the system’s human interface” [Minsky1980]. Such virtual presence is made possible by digital imaging systems and real-time communication of visuals and interaction signals. Immersive imaging systems such as light fields and omnidirectional imaging enhance the visual sense of presence, i.e., “being there” [IJsselsteijn2000], with photorealistic recreation of the remote scene. This emerging field has seen rapid growth, both in research and development [Valenzise2022], due to advancements in imaging and display technologies, combined with increasing demand for interactive and immersive experiences. A visualization is provided in Figure 1 that shows a telepresence system that utilizes traditional cameras and controls and an immersive telepresence system.

The experience of “presence” consists of three components according to Schubert et al. [Schubert2001], which are renamed in this article to take into account other definitions:
- Realness – “Realness” [Schubert2001] or “realism” [Takatalo2008] of the environment (i.e., in this case, the remote scene) relates to the “believability, the fidelity and validity of sensory features within the generated environments, e.g., photorealism.” [Perkis 2020].
- Immersion – User’s level of “involvement” [Schubert2001] and “concentration to the virtual environment instead of real world, loss of time” [Takatalo2008]. “The combination of sensory cues with symbolic cues essential for user emplacement and engagement” [Perkis2020].
- Spatiality – An attribute of the environment helps “transporting” the user to induce spatial awareness [Schubert2001] which allows “spatial presence” [Takatalo2008] and “the possibility for users to move freely and discover the world offered” [Perkis2020].
Immersion can happen without having realness or spatiality, for example, while we are reading a novel. Telepresence using traditional imaging systems might not be immersive in case of a relatively small display and other distractors present in the visual field. Realistic immersive telepresence necessitates higher degrees of freedom (e.g., 3 DoF+ or 6DoF) compared to a telepresence application with a traditional display. In this context, new view synthesis methods and spherical light field representations (cf. Section 3) will be crucial in giving correct depth cues and depth perception – which will increase realness and spatiality tremendously.
The rapid progress of immersive imaging technologies and their adoption can largely be attributed to advancements in processing and display systems, including light field displays and extended reality (XR) headsets. These XR headsets are becoming increasingly affordable while delivering excellent user experiences [Jackson2023], paving the way for the widespread adoption of immersive communication and telepresence applications in the near future. To further accelerate this transition, extensive efforts are being undertaken in both academia as well as industry.
The visual realism (i.e., realness) in realistic immersive telepresence relies on acquired photos rather than computer-generated imagery (CGI). In healthcare, it enables realistic remote consultations and surgical collaborations [Wisotzky2025]. In education and training, it facilitates immersive, location-independent learning environments [Kachach2021]. Similarly, visual realism can enhance remote collaboration by creating lifelike meeting spaces, while in media and entertainment, it can provide unprecedented realism for live events and performances, offering users a closer connection and having a feeling of being present on remote sites.
This article provides a brief overview of the technological foundations, applications, and challenges in immersive telepresence. The novel contribution of this article is setting up the theoretical framework for realistic immersive telepresence informed by prior literature and positioning the findings of the author’s recent publication [Zerman2024] within this broader theoretical framework. It explores how foundational technologies like light field imaging and real-time rendering drive the field forward, while also identifying critical obstacles, such as dataset availability, compression efficiency, and QoE evaluation.
2. Technological Foundations for Immersive Telepresence
A realistic immersive telepresence can be made possible by enabling its main defining factors of realness (e.g., photorealism), immersion, and spatiality. Although these factors can be satisfied with other modalities (e.g., spatial audio), this article focuses on the visual modality and visual recreation of the remote scene.
2.1 Immersive Imaging Modalities
Immersive imaging technologies encompass a wide range of methods aimed at capturing and recreating realistic visual and spatial experiences. These include light fields, omnidirectional images, volumetric videos using either point clouds or 3D meshes, holography, multi-view stereo imaging, neural radiance fields, Gaussian splats, and other extended reality (XR) applications — all of which contribute to recreating highly realistic and interactive representations of scenes and environments.
Light fields (LF) are vector fields of all the light rays passing through a given region in space, describing the intensity and direction of light at every point. This is fully described through the plenoptic function [Adelson1991] as follows: P(x,y,z,θ,ϕ,λ,t), where x, y, and z describe the 3D position of sampling, θ and ϕ are the angular direction, λ is the wavelength of the light ray, and t is time. Traditionally, LFs are represented using the two-plane parametrization [Levoy1996] with 2 spatial dimensions and 2 angular dimensions; however, this parametrization limits the use case of LFs to processing planar visual stimuli. The plenoptic function can be leveraged beyond the two-plane parameterization for a highly detailed view reconstruction or view synthesis. Newer capture scenarios and representations enable increased immersion with LFs [Overbeck2018],[Broxton2020], which can be further advanced in the future.
Omnidirectional image (or video) representation can provide an all-encompassing 360-degree view of a scene from a point in space for immersive visualization [Yagi1999], [Maugey2023]. This is made possible by stitching multiple views together. The created spherical image can be stored using traditional image formats (i.e., 2D planar formats) by projecting the sphere to planar format (e.g., equirectangular projection, cubemap projection, and others); however, processing these special representations without proper consideration for their spherical nature results in errors or biases.
2.2 Processing Requirements for Realistic Immersive Telepresence
Immersive telepresence relies on capturing, transmitting, and rendering realistic representations of remote environments. “Capturing” can be considered an inherent part of the imaging modalities discussed in the previous section. For transmitting and rendering, there are different requirements to take into account.
Compression is an important step for telepresence that relies heavily on real-time transmission of the visual data from the remote scene. The importance of compression increases even more for immersive telepresence applications as immersive imaging modalities capture (and represent) more information and need even more compression compared to the telepresence using traditional 2D imaging systems. Compression of LFs [Stepanov2023], omnidirectional images and video [Croci2020], and other forms of immersive video such as MPEG Immersive Video [Boyce2021], volumetric 3D representations represented with point clouds [Graziosi2020], and textured 3D meshes [Marvie2022] have been a very hot research topic within the last decade, which led to the standardization of compression methods for some immersive imaging modalities.
Rendering [Eisert2023], [Maugey2023] is yet another important aspect, especially for LFs [Overbeck2018]. The LF data needs to be rendered correctly for the position of the viewer (i.e., to render interpolated or extrapolated views) to provide a realistic and immersive experience to the user. Without the view rendering (i.e., for interpolation or extrapolation), the final displayed visuals will appear jittery, which will make the experience harder to sustain the necessary “suspension of disbelief” for an immersive experience. Furthermore, this rendering has to be real-time, as it is a requirement for telepresence. Although technologies such as GPU acceleration and advanced compression algorithms ensure seamless interaction while minimizing latency, the quality and the realness of the remote scene are still to be solved.
Immersive telepresence systems rely on specialized hardware, including omnidirectional cameras, head-mounted displays, and motion tracking systems. These components must work in harmony to deliver high-quality, immersive experiences. Reducing prices and increasing availability of such specialized devices make them easier to deploy in industrial settings [Jackson2023] regardless of business size and enables the democratization of immersive imaging applications in a broader sense.
3. Efforts in Creating a Realistic Immersive Telepresence Experience
Creating an immersive telepresence system has been a topic of many scholarly studies. These include frameworks for group-to-group telepresence [Beck2013], creating capture and delivery frameworks for volumetric 3D models [Fechteler2013], and various other social XR applications [Cortés2024]. Google’s project Starline can also be mentioned here to include realness and immersion in its delivery of the visuals, creating an immersive experience [Lawrence2024], [Starline2025], although its main functionality is interpersonal video communication. In supporting realness, LFs [Broxton2020] and other types of neural representations [Suhail2022] can create views that can support reflections and similar non-Lambertian light material interactions in recreating light occurring in the remote scene, whereas the usual assumption for texturing reconstructed 3D objects is to assume Lambertian materials [Zhi2020].
Light field reconstruction [Gond2023] and new view synthesis from single-view [Lin2023] or sparse views [Chibane2021] can be a valid way to approach creating realistic immersive telepresence experiences. Various representations can be used to recreate various views that would support movement of the user and the spatial awareness factor of presence in the remote scene. These representations can be Multi-Planar Image (MPI) [Srinivasan2019], Multi-Cylinder Image (MCI) [Waidhofer2022], layered mesh representation [Broxton2020], and neural representations [Chibane2021], [Lin2023], [Gond 2023] – which rely on structured or unstructured 2D image captures of the remote scene.
Another way of creating a realistic immersive experience can be by combining the different imaging modalities – i.e., omnidirectional content and light fields – in the form of spherical light fields (SLFs). SLFs then enable rendering and view synthesis that can generate more realistic and immersive content. There have been various attempts to create SLFs by collecting linear captures vertically [Krolla2014], capturing omnidirectional content from the scene with multiple cameras [Maugey2019], and moving a single camera in a circular trajectory and utilizing deep neural networks to generate an image grid [Lo2023]. Nevertheless, these works either did not yield publicly available datasets or did not have precise localizations of the cameras. To address this, the Spherical Light Field Database (SLFDB) was introduced in previous work [Zerman2024], which provides a foundational dataset for testing and developing applications for realistic immersive telepresence applications.
4. Challenges and Limitations
Studies in creating realistic immersive telepresence environments showed that there are still certain challenges and limitations that need to be addressed to improve QoE and IMEx for these systems. These challenges include dataset availability, compression of the structured and unstructured LFs, new view synthesis and rendering, and QoE estimation. Most of these challenges are also discussed in our recent study [Zerman2024].

Datasets relevant to realistic immersive telepresence tasks, such as the SLFDB [Zerman2024], are crucial for developing and validating immersive telepresence technologies. However, the creation and use of such datasets with precise spatial and angular resolution and very precise positioning of the camera face significant hurdles. Traditional camera grid setups are ineffective for capturing spherical light fields due to occlusions. This challenge necessitates having static scenes and meticulous camera positioning for a consistent capture of the scene. A dynamic scene brings a risk of non-consistent views within the same light field, as shown in Figure 2, which is non-ideal. These challenges highlight the critical need for innovative approaches to spherical light field dataset generation and sharing, ensuring future advancements in the field. Additionally, variations in lighting present significant challenges when capturing spherical light fields, as they impact the scene’s dynamic range, white balance, and color grading, which creates yet another challenge in database creation. Brightness and color variations, such as sunlight’s yellow tint compared to cloudy daylight, are not easy to correct and often require advanced algorithms for adjustment. Capturing static outdoor scenes remains a challenge for future work, as they still encounter lighting-related issues despite lacking movement.
LF compression is also another challenge that requires attention after combining imaging modalities. JPEG Pleno compression algorithm [ISO2021] is adapted for 2-dimensional grid-like structured LFs (e.g., LFs captured by microlens array or structured camera grids) and does not work for linear or unstructured captures. The situation is the same for many other compression methods, as most of them require some form of structured representation. Considering how well scene regression and other new view synthesis algorithms can adapt for unstructured inputs, one can also see the importance of advancing the compression field for unstructured LFs (e.g., the volume of light captured by cameras in various positions or in-the-wild user captures). Furthermore, the said LF compression method needs to be real-time to support immersive telepresence applications while having a very good visual QoE that would not impede realism.

Current new view synthesis methods are primarily designed to handle small baselines, typically just a few centimeters, and face significant challenges when applied to larger baselines required in telepresence applications. Challenges such as ghosting artifacts and unrealistic distortions (e.g., nonlinear distortions, stretching) occur when interpolating views, particularly for larger baselines, as shown in Figure 3. A recent comparative evaluation of PanoSynthVR and 360ViewSynth [Zerman2024] reveals that while 360ViewSynth marginally outperforms PanoSynthVR on average quality metrics, the scores for both methods remain suboptimal. PanoSynthVR struggles with large baselines, exhibiting prominent layer-like ghosting artifacts due to limitations in its MCI structure. Although 360ViewSynth produces visually better results, closer inspection shows that it distorts object perspectives by stretching them rather than accurately rendering the scene, leading to an unnatural user experience. These findings underscore the limitations of current state-of-the-art view synthesis methods for SLFs and highlight the complexity of addressing larger baselines effectively in view synthesis.
Assessing user satisfaction and immersion in telepresence systems is a multidimensional challenge, requiring assessments in three different strands as described in IMEx whitepaper: subjective assessment, behavioral assessment, and assessment via psycho-physiological methods [Perkis2020]. Quantitative metrics can be used for interaction latency and task performance metrics in a user study, and individual preferences and experiences can be collected qualitatively. Certain aspects of user experience, such as visual quality and user engagement, can also be collected as quantitative data during user studies – with user self-reporting. Additionally, behavioral assessment (e.g., user movement, interaction patterns) can be used to identify different use patterns. Here, the limiting factor is mainly the time and experience cost in running the said user studies. Therefore, the challenge here is to prepare a framework that can model the user experience for realistic immersive telepresence scenarios, which can speed up the assessment strategies.
Other limitations and aspects to consider include accessibility, privacy issues, and ethics. Regarding accessibility, it is important to ensure that immersive telepresence technologies are affordable and usable by diverse populations. The situation is improving as the cameras and headsets are getting cheaper and easier to use (e.g., faster and stronger on-device processing, removal of headset connection cables, increased ease of use with hand gestures, etc.). Nevertheless, hardware costs, connectivity requirements, and usability barriers must be further addressed to make these systems widely accessible. Regarding privacy and ethics, the realistic nature of immersive telepresence may raise ethical and privacy concerns. Capturing and transmitting live environments may involve sensitive data, necessitating robust privacy safeguards and ethical guidelines to prevent misuse. Also, privacy concerns regarding the headsets that rely on visual cameras for localization and mapping must be addressed.
5. Conclusions and Future Directions
Realistic immersive telepresence systems represent a transformative shift in how people interact with remote environments. By combining advanced imaging, rendering, and interaction technologies, these systems promise to revolutionize industries ranging from healthcare to entertainment. However, significant challenges remain, including data availability, compression, rendering, and QoE assessment. Addressing these obstacles will require collaboration across disciplines and industries.
To address these challenges, future research should focus on attempting to create relevant datasets for spherical LFs that address with accurate positioning of the camera and challenges such as dynamic lighting conditions and occlusions. Developing real-time, robust compression methods for unstructured LFs, which maintain visual quality and support immersive applications, is another critical area. Developing advanced view synthesis algorithms capable of handling large baselines without introducing artifacts or distortions and creating frameworks for user experience and QoE assessment methodologies are still open research questions.
Further into the future, the remaining challenges can be solved using learning-based algorithms for the challenges related to realness and spatiality factors as well as QoE estimation, increasing the level of interactivity and feeling of immersion through integrating different senses to the existing systems (e.g., spatial audio, haptics, natural interfaces), and increasing the standardization to create common frameworks that can manage interoperability across different systems. Long-term goals include the integration of realistic immersive displays – such as LF displays or improved holographic displays – and the convergence of telepresence systems with emerging technologies like 5G or 6G networks and edge computing, on which the efforts are already underway [Mahmoud2023].
References
- [Adelson1991] Adelson, E. H., & Bergen, J. R. (1991). The plenoptic function and the elements of early vision (Vol. 2). Cambridge, MA, USA: Vision and Modeling Group, Media Laboratory, Massachusetts Institute of Technology.
- [Beck2013] Beck, S., Kunert, A., Kulik, A., & Froehlich, B. (2013). Immersive group-to-group telepresence. IEEE transactions on visualization and computer graphics, 19(4), 616-625.
- [Boyce2021] Boyce, J. M., Doré, R., Dziembowski, A., Fleureau, J., Jung, J., Kroon, B., … & Yu, L. (2021). MPEG immersive video coding standard. Proceedings of the IEEE, 109(9), 1521-1536.
- [Broxton2020] Broxton, M., Flynn, J., Overbeck, R., Erickson, D., Hedman, P., Duvall, M., … & Debevec, P. (2020). Immersive light field video with a layered mesh representation. ACM Transactions on Graphics (TOG), 39(4), 86-1.
- [Chibane2021] Chibane, J., Bansal, A., Lazova, V., & Pons-Moll, G. (2021). Stereo radiance fields (SRF): Learning view synthesis for sparse views of novel scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 7911-7920).
- [Cortés2024] Cortés, C., Pérez, P., & García, N. (2023). Understanding latency and qoe in social xr. IEEE Consumer Electronics Magazine.
- [Croci2020] Croci, S., Ozcinar, C., Zerman, E., Knorr, S., Cabrera, J., & Smolic, A. (2020). Visual attention-aware quality estimation framework for omnidirectional video using spherical Voronoi diagram. Quality and User Experience, 5, 1-17.
- [Eisert2023] Eisert, P., Schreer, O., Feldmann, I., Hellge, C., & Hilsmann, A. (2023). Volumetric video– acquisition, interaction, streaming and rendering. In Immersive Video Technologies (pp. 289-326). Academic Press.
- [Fechteler2013] Fechteler, P., Hilsmann, A., Eisert, P., Broeck, S. V., Stevens, C., Wall, J., … & Zahariadis, T. (2013, June). A framework for realistic 3D tele-immersion. In Proceedings of the 6th International Conference on Computer Vision/Computer Graphics Collaboration Techniques and Applications.
- [Gond2023] Gond, M., Zerman, E., Knorr, S., & Sjöström, M. (2023, November). LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image. In Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production (pp. 1-10).
- [Graziosi2020] Graziosi, D., Nakagami, O., Kuma, S., Zaghetto, A., Suzuki, T., & Tabatabai, A. (2020). An overview of ongoing point cloud compression standardization activities: Video-based (V-PCC) and geometry-based (G-PCC). APSIPA Transactions on Signal and Information Processing, 9, e13.
- [IJsselsteijn2000] IJsselsteijn, W. A., De Ridder, H., Freeman, J., & Avons, S. E. (2000, June). Presence: concept, determinants, and measurement. In Human Vision and Electronic Imaging V (Vol. 3959, pp. 520-529). SPIE.
- [ISO2021] ISO/IEC 21794-2:2021 (2021) Information technology – Plenoptic image coding system (JPEG Pleno) — Part 2: Light field coding.
- [Jackson2023] Jackson, A. (2023, September) Meta Quest 3: Can businesses use VR day-to-day?, Technology Magazine. https://technologymagazine.com/digital-transformation/meta-quest-3-can-businesses-use-vr-day- to-day, Accessed: 2024-02-05.
- [Kachach2021] Kachach, R., Orduna, M., Rodríguez, J., Pérez, P., Villegas, Á., Cabrera, J., & García, N. (2021, July). Immersive telepresence in remote education. In Proceedings of the International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE’21) (pp. 21-24).
- [Krolla2014] Krolla, B., Diebold, M., Goldlücke, B., & Stricker, D. (2014, September). Spherical Light Fields. In BMVC (No. 67.1–67.12).
- [Lawrence2024] Lawrence, J., Overbeck, R., Prives, T., Fortes, T., Roth, N., & Newman, B. (2024). Project starline: A high-fidelity telepresence system. In ACM SIGGRAPH 2024 Emerging Technologies (pp. 1-2).
- [Levoy1996] Levoy, M. & Hanrahan, P. (1996) Light field rendering, in Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques (pp. 31-42), New York, NY, USA, Association for Computing Machinery.
- [Lin2023] Lin, K. E., Lin, Y. C., Lai, W. S., Lin, T. Y., Shih, Y. C., & Ramamoorthi, R. (2023). Vision transformer for nerf-based view synthesis from a single input image. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 806-815).
- [Lo2023] Lo, I. C., & Chen, H. H. (2023). Acquiring 360° Light Field by a Moving Dual-Fisheye Camera. IEEE Transactions on Image Processing.
- [Mahmoud2023] Mahmood, A., Abedin, S. F., O’Nils, M., Bergman, M., & Gidlund, M. (2023). Remote-timber: an outlook for teleoperated forestry with first 5g measurements. IEEE Industrial Electronics Magazine, 17(3), 42-53.
- [Marvie2022] Marvie, J. E., Krivokuća, M., Guede, C., Ricard, J., Mocquard, O., & Tariolle, F. L. (2022, September). Compression of time-varying textured meshes using patch tiling and image-based tracking. In 2022 10th European Workshop on Visual Information Processing (EUVIP) (pp. 1-6). IEEE.
- [Maugey2019] Maugey, T., Guillo, L., & Cam, C. L. (2019, June). FTV360: A multiview 360° video dataset with calibration parameters. In Proceedings of the 10th ACM Multimedia Systems Conference (pp. 291-295).
- [Maugey2023] Maugey, T. (2023). Acquisition, representation, and rendering of omnidirectional videos. In Immersive Video Technologies (pp. 27-48). Academic Press. [Minsky1980] Minsky, M. (1980). Telepresence. Omni, pp. 45-51.
- [Overbeck2018] Overbeck, R. S., Erickson, D., Evangelakos, D., Pharr, M., & Debevec, P. (2018). A system for acquiring, processing, and rendering panoramic light field stills for virtual reality. ACM Transactions on Graphics (TOG), 37(6), 1-15.
- [Perkis2020] Perkis, A., Timmerer, C., et al. (2020, May) “QUALINET White Paper on Definitions of Immersive Media Experience (IMEx)”, European Network on Quality of Experience in Multimedia Systems and Services, 14th QUALINET meeting (online), Online: https://arxiv.org/abs/2007.07032
- [Schubert2001] Schubert, T., Friedmann, F., & Regenbrecht, H. (2001). The experience of presence: Factor analytic insights. Presence: Teleoperators & Virtual Environments, 10(3), 266-281.
- [Srinivasan2019] Srinivasan, P. P., Tucker, R., Barron, J. T., Ramamoorthi, R., Ng, R., & Snavely, N. (2019). Pushing the boundaries of view extrapolation with multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 175-184).
- [Starline2025] Project Starline: Be there from anywhere with our breakthrough communication technology. (n.d.). Online: https://starline.google/. Accessed: 2025-01-14
- [Stepanov2023] Stepanov, M., Valenzise, G., & Dufaux, F. (2023). Compression of light fields. In Immersive Video Technologies (pp. 201-226). Academic Press.
- [Suhail2022] Suhail, M., Esteves, C., Sigal, L., & Makadia, A. (2022). Light field neural rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 8269-8279).
- [Takatalo2008] Takatalo, J., Nyman, G., & Laaksonen, L. (2008). Components of human experience in virtual environments. Computers in Human Behavior, 24(1), 1-15.
- [Valenzise2022] Valenzise, G., Alain, M., Zerman, E., & Ozcinar, C. (Eds.). (2022). Immersive Video Technologies. Academic Press.
- [Waidhofer2022] Waidhofer, J., Gadgil, R., Dickson, A., Zollmann, S., & Ventura, J. (2022, October). PanoSynthVR: Toward light-weight 360-degree view synthesis from a single panoramic input. In 2022 IEEE International Symposium on Mixed and Augmented Reality (ISMAR) (pp. 584-592). IEEE.
- [Wisotzky2025] Wisotzky, E. L., Rosenthal, J. C., Meij, S., van den Dobblesteen, J., Arens, P., Hilsmann, A., … & Schneider, A. (2025). Telepresence for surgical assistance and training using eXtended reality during and after pandemic periods. Journal of telemedicine and telecare, 31(1), 14-28.
- [Yagi1999] Yagi, Y. (1999). Omnidirectional sensing and its applications. IEICE transactions on information and systems, 82(3), 568-579.
- [Zerman2024] Zerman, E., Gond, M., Takhtardeshir, S., Olsson, R., & Sjöström, M. (2024, June). A Spherical Light Field Database for Immersive Telecommunication and Telepresence Applications. In 2024 16th International Conference on Quality of Multimedia Experience (QoMEX) (pp. 200-206). IEEE.
- [Zhi2020] Zhi, T., Lassner, C., Tung, T., Stoll, C., Narasimhan, S. G., & Vo, M. (2020). TexMesh: Reconstructing detailed human texture and geometry from RGB-D video. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16 (pp. 492-509). Springer International Publishing.