A Lightweight 3D Convolutional Autoencoder Architecture for Temporal Coherence in 3D Video

Yuna Choi; Seungmin Park; Isabela Rocha

PDF

Published: Aug 11, 2025

Yuna Choi

Pohang University of Science and Technology (POSTECH)

Seungmin Park

Korea Advanced Institute of Science and Technology (KAIST)

Isabela Rocha

Universidade de S ̃ao Paulo (USP)

Abstract

We propose a novel lightweight 3D convolutional autoencoder architecture designed to efficiently encode and decode spatiotemporal information from 3D video data while preserv- ing temporal coherence between frames. We present a theoretical analysis on the stability of temporal feature representations and validate the approach on a synthetic 3D video dataset of moving volumetric shapes. Experimental results demonstrate the effectiveness of our method in reconstructing 3D videos with high fidelity and smooth temporal transitions, highlighting its potential for real-world 3D video processing applications.

How to Cite

Choi, Y., Park, S., & Rocha, I. (2025). A Lightweight 3D Convolutional Autoencoder Architecture for Temporal Coherence in 3D Video. Special Interest Group on Artificial Intelligence Research, 2(1). Retrieved from https://sigair.org/index.php/journal/article/view/21

Issue

Vol. 2 No. 1 (2025)

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/

References

D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 4489–4497, 2015.

N. Srivastava, E. Mansimov, and R. Salakhudinov, “Unsupervised learning of video represen- tations using lstms,” in International conference on machine learning, pp. 843–852, PMLR, 2015.

Y. Xie, H. Chen, G. P. Meyer, Y. J. Lee, E. M. Wolff, M. Tomizuka, W. Zhan, Y. Chai, and X. Huang, “Cohere3d: Exploiting temporal coherence for unsupervised representation learning of vision-based autonomous driving,” arXiv preprint arXiv:2402.15583, 2024.

D. Qu, Y. Lao, Z. Wang, D. Wang, B. Zhao, and X. Li, “Towards nonlinear-motion-aware and occlusion-robust rolling shutter correction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10680–10688, 2023.

G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504–507, 2006.

Article Sidebar

Main Article Content

Abstract

Article Details

References