Viviechoformer: Deep Video Regressor Predicting Ejection Fraction

Akan, Taymaz; Alp, Sait; Bhuiyan, Md. Shenuarin; Helmy, Tarek; Orr, A. Wayne; Bhuiyan, Md. Mostafizur Rahman; Bhuiyan, Mohammad Alfrad Nobel

Viviechoformer: Deep Video Regressor Predicting Ejection Fraction

Date

2025

Authors

Akan, Taymaz

Alp, Sait

Bhuiyan, Md. Shenuarin

Helmy, Tarek

Orr, A. Wayne

Bhuiyan, Md. Mostafizur Rahman

Bhuiyan, Mohammad Alfrad Nobel

Publisher

Springer

Abstract

Heart disease is the leading cause of death worldwide, and cardiac function as measured by ejection fraction (EF) is an important determinant of outcomes, making accurate measurement a critical parameter in PT evaluation. Echocardiograms are commonly used for measuring EF, but human interpretation has limitations in terms of intra- and inter-observer (or reader) variance. Deep learning (DL) has driven a resurgence in machine learning, leading to advancements in medical applications. We introduce the ViViEchoformer DL approach, which uses a video vision transformer to directly regress the left ventricular function (LVEF) from echocardiogram videos. The study used a dataset of 10,030 apical-4-chamber echocardiography videos from patients at Stanford University Hospital. The model accurately captures spatial information and preserves inter-frame relationships by extracting spatiotemporal tokens from video input, allowing for accurate, fully automatic EF predictions that aid human assessment and analysis. The ViViEchoformer's prediction of ejection fraction has a mean absolute error of 6.14%, a root mean squared error of 8.4%, a mean squared log error of 0.04, and an R2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}{2}$$\end{document} of 0.55. ViViEchoformer predicted heart failure with reduced ejection fraction (HFrEF) with an area under the curve of 0.83 and a classification accuracy of 87 using a standard threshold of less than 50% ejection fraction. Our video-based method provides precise left ventricular function quantification, offering a reliable alternative to human evaluation and establishing a fundamental basis for echocardiogram interpretation.