Automatic detection of persistent physiological changes after COVID infection via wearable devices with potential for long COVID management

Categories: Disease & Virus

August 11, 2025

Study design and data collection procedures

Here, we briefly describe details of our previous study and their relevance to the current study. For further details on the study design, data collection platform, and device selection, please see²⁶. After enrollment, study participants were distributed a Garmin watch (Fenix 6 or Vivoactive 4 models) and/or an Oura ring. The physiological parameters used in the current study from the Garmin devices were: respiration rate (RR) and pulse oximetry (SpO2), while the physiological parameters used in the current study from the Oura device were: inter-beat interval (IBI), skin temperature, and sleep hypnogram. The physiological parameters from both Garmin and Oura wearables were aggregated per sleep episode, as determined by Oura’s hypnogram, into sets of statistics (i.e.: mean, 5th percentile, 95th percentile, standard deviation, coefficient of variation, etc.). No data outside Oura’s hypnogram periods were used. Integration with Garmin watches was achieved by creating a Garmin Connect Developer account that enabled Garmin watch participants to enroll in the data sharing study via an OAuth authentication process. Integration with Oura rings was achieved by creating an Oura Teams account that enabled Oura ring participants to consent to share their data with the study via a secure authentication step during registration. Participant data were automatically downloaded from the Oura cloud without the collection of any personally identifiable information via custom software that was scheduled to run frequently on the study platform. For further additional detailed on the data collection platform and device selection and handling please see²⁶.

Participants were asked to complete a daily survey, which collected self-reported symptoms and fiducial points for vaccination or positive and negative test results for infection, including COVID-19. Individuals were tested by a COVID-19 RT-PCR or rapid test. Individuals reported test dates and results via a daily web-based survey. The survey data was periodically verified with the individual by study personnel for all COVID-19 positive cases to ensure the results and symptom fiducial points were accurate (see²⁶ for more information) for additional details. For the final cohort, only one individual COVID-19 test was used per user. Thus, in this study, all users were in one of three mutually exclusive groups: COVID-19 Negative, COVID-19 Acute, or COVID-19 with persistent changes.

Ethical approval

The collection and use of the wearable dataset was approved by the Institutional Review Boards of the US Department of Defense. All methods were carried out in accordance with relevant guidelines and regulations. Informed consent was obtained from all participants.

Physiological parameters

The physiological parameters monitored in this study include direct measurements from the wearable devices and custom analytics derived from IBIs recorded during sleep (as detected by Oura’s hypnogram). We chose to focus on physiological data during sleep because sleep provides relatively stable and high-quality measurements²⁶: pulse rate, skin temperature, and SpO2. Respiratory rate was derived from IBIs. Additional custom heart rate variability features were also computed. The complete list of physiological features collected during an entire sleep episode are summarized in Supplementary Note 4.

For each physiological parameter, a set of 23 statistics were computed in order to aggregate the physiological data during sleep. The aggregated statistics were: min, max, mean, std, skewness, kurtosis, hyperskewness, hypertailedness, pct_1 (1st percentile), pct_5, pct_10, pct_20, pct_25, pct_30, pct_40, pct_50, pct_60, pct_70, pct_75, pct_80, pct_90, pct_95, pct_99. Thus, yielding a total of 34 physiological parameter $\:\times\:$ 23 statistics = 782 features per segment of sleep.

Data window selection

For each participant, we established three time windows: (1) a baseline window, (2) a testing window, and (3) a study window. The baseline window serves as a control reference prior to the testing date as shown in Fig. 6, and it spans over 4 weeks (28 days), from 6 weeks before testing day to 2 weeks before testing day. The infection (testing) window, defined as the time frame around the testing day when clinical testing occurred, spans over 2 weeks (14 days), from one week before the testing day to one week after the testing day. The study window spans over 4 weeks (28 days), from 4 weeks after the testing day to 8 weeks after the testing day. These parameters for start and duration of the study window were chosen based on the CDC guidelines for detecting Long COVID^5,6. While the CDC defines post-COVID conditions as symptoms lasting at least four weeks post-infection, the WHO, National Academy of Sciences, and other international bodies define Long COVID more conservatively, typically requiring the presence of symptoms at least 12 weeks after acute infection. The 4–8 week study window used in this work, though informed by CDC guidance, may therefore reflect an early post-acute recovery period rather than the chronic phase of Long COVID. Accordingly, our findings should be interpreted as indicative of infection-acquired persistent physiological changes, which may represent precursors or components of Long COVID but cannot, in the absence of formal clinical diagnosis, be equated directly with WHO-defined cases.

Wearable compliance

Compliance refers to the percentage of time participants wore the wearable device(s), resulting in usable data for analysis. Data from wearable devices often have varying levels of completeness due to inconsistent usage or technical issues. In this study, we derived physiological parameters using data collected from either the Oura ring or the Garmin watch or both together. The availability of specific physiological parameters depended on the presence of corresponding raw physiological features such as HR, temperature, blood oxygen saturation (SpO2), and IBI. Consequently, aggregated statistics were only available when the corresponding raw physiological data were fully present. These factors cause varied compliance across different physiological parameters and across different users, posing a key challenge in ensuring participant compliance, as inconsistent device usage can lead to data gaps, compromising the accuracy and reliability of physiological feature extraction and analysis.

To mitigate this issue, we included only users and physiological features with at least 60% coverage (compliance) across the baseline and observation (study) windows. To give a reference, a 10% compliance threshold implies that participants wore the wearables at least 10% of total days for both baseline and observation windows, which corresponds to almost 3 days for each window, whereas, for instance, an 80% compliance indicates near-continuous usage (22 out of 28 days for both baseline and study windows). Here 60% compliance corresponds to roughly 17 days of nightly physiological measurements.

A day was considered compliant for a given physiological parameter if the participant had valid wearable-derived data covering at least one complete sleep segment, as detected by the device’s (Oura or Garmin) built-in sleep algorithm. In this study, we only considered sleep periods with ≥ 4 h of consecutive nighttime data. Days with < 4 h of usable sleep data or data recorded exclusively during waking hours were excluded from compliance calculations.

To account for the inherent variability in the availability of each wearable physiological parameter, the compliance criterion was evaluated individually for each parameter. After filtering for 60% compliance threshold on every physiological parameter, on average 31.7% of COVID positive and 22.5% of COVID negative subjects were included in the analysis. Alternative selection criteria based on other compliance thresholds and the corresponding resulting sample sizes are shown in Supplementary Note 3.

Determining infection-acquired feature deflection

To establish our methodology for detecting sustained changes long after the acute phase of COVID-19 infection, we first defined a direction of change associated with the infection for each physiological parameter. Specifically, we identified how COVID-19 viral infection might alter the baseline for each parameter during the infection period. We employed a data-driven approach to determine if deviation of a physiological parameter during the study window is in the same direction of the deviation of the same parameter during the acute infection relative to the baseline levels. For instance, with this approach, a comparison of the baseline and infection window shows an increase in heart rate in the latter, from which we then assume increases in heart rate represent an infection. We then compare the baseline window and the study window, assuming again that increases in heart rate represent the infection and decreases in heart rate are ignored or assumed normal. This approach allows us to define in a data-driven way the healthy state or direction for all features including custom ones (where healthy state may not be so obvious). In addition, by applying this conservative filtering, we aim to further reduce false positives by including only persistent changes that are related to an adverse state of health. Moreover, by integrating the direction of deflection, we were able to provide a more comprehensive understanding of how COVID-19 infection may alter physiological parameters captured via wearables.

We acknowledge the limitation that in some instances, an abnormal change may not be reflected when comparing the average value of the feature across windows, for example, abnormal changes could yield a change in variance, skewness, or tailedness rather than the central tendency. However, we expect to observe a significant deflection in at least a subset of these parameters with respect to parameters’ mean values over each window, whether it be an elevated deflection, suppressed deflection, or no-change.

For each physiological parameter, the difference between its individual mean value during the testing window and the baseline window is calculated. For each participant and each physiological parameter, the mean value during the baseline window as well as the mean value during the testing window are calculated. We compute the deflection as:

$$\:Adjusted\:Deflection=\:\frac{\left(Mea{n}_{infection}-Mea{n}_{baseline}\right)}{\sqrt{\frac{1}{2}\left({\sigma\:}_{Infection}^{2}+{\sigma\:}_{Baseline}^{2}\right)}}$$

(1)

In our calculation, we used the “adjusted deflection” metric, a standardized effect size measure similar to Cohen’s d, to determine the direction of change (Eq. 1). A positive value indicates an increase, while a negative value signifies a decrease during the testing (infection) window.

For each feature across all participants, we assessed the overall trend by computing the average adjusted deflection, and we output a + or – sign named feature_direction.

In a supplementary analysis, we also reported the significance and magnitude of the average adjusted deflections for the top 10 features. See Supplementary Note 5 for more information.

Detection of persistent change using null distribution from negative cohorts

To identify users with significantly persistent physiological parameter(s) changes post-acute COVID-19 infection, here we utilized a dual-cohort approach involving users who tested positive as well as users who tested negative for COVID-19. This methodology allowed detecting individuals with persistent changes specifically attributable to the aftermath of COVID-19 while accounting for natural variability. Here the dual-cohort methodology to identify users with persistent changes post COVID-19 is explained:

1.

Baseline and confidence interval (CI) calculation per physiological parameter.

We began by establishing individualized baselines for each physiological parameter, derived from pre-infection data, i.e., data unaffected by COVID-19. For each subject and each physiological parameter, a 90% confidence interval (CI) was calculated in the baseline window, representing the typical range within which 90% of the baseline values are expected to fall. The CI serves upper and lower threshold for identifying deviations that are likely to be significant.

2.

Study window and identification of significant changes.

Significant changes post-acute infection were identified by flagging nightly measurements where a physiological parameter fell outside the 90% CI of its baseline (upper or lower the CI band) for more than 90% of the time during the 4-week study window. We call this statistic the out-of-bound percent. The metric indicates a sustained deviation from individualized pre-infection baseline, suggesting a persistent impact of COVID-19. This stringent criterion ensures that only sustained and consistent deviations, indicative of possible lingering effects of COVID-19, were flagged as significant.

3.

Null distribution using COVID-19 negative cohorts.

To further validate our findings, we employed data from the COVID-19 negative cohort as a null distribution. This control group consisted of individuals who tested negative for COVID-19, had no symptoms during the baseline window, and remained symptom-free throughout the testing window, as well as the study window. By applying the aforementioned criteria, we aimed to ensure that the physiological parameters of the COVID-19 negative cohort were representative of uninfected conditions.

The out-of-bound percent metric calculated in the previous step, should demonstrate a significant difference from the corresponding physiological feature in the COVID-19 negative cohort. This criterion ensures that the observed changes are specifically related to COVID-19 infection and not part of natural variability and/or biases. A significant difference was determined by comparing the out-of-bound percentage for each feature in the COVID-19 positive cohort to a null distribution derived from the COVID-19 negative cohort, using a 5% significance level. For each feature, subjects with an out-of-bound percentage exceeding the 95th percentile of the same feature in the COVID-19 negative cohort were selected.

By comparing the changes observed in the COVID-19 positive cohort against this null distribution, we were able to distinguish changes specifically attributable to the post-COVID-19 condition from those that might occur naturally or due to other unrelated factors.

4.

Direction of change over the study window.

The physiological feature needs to show a deflection in the same direction as that of the infection window (feature_direction). This criterion helps to reduce false positives and identify only changes that imply a deterioration in health status. Building on the infection-acquired feature deflection, we further analyzed the direction of these changes to understand their association with COVID-19 infection. Significant physiological parameter changes, identified by their sustained deviation from the 90% CI of the baseline for more than 90% of the time during the study window, were examined to determine whether these deviations indicated an increase or decrease relative to the baseline. Only the deviations that matched the sign of feature_direction were retained.

5.

Feature down-selection based on coverage and correlation.

After the abovementioned individual feature selection algorithms, the following detection fusion algorithm is computed as a post-hoc correction to control for the number of null tests being performed:

Begin with a ranked list of features, prioritizing those with the highest coverage (compliance) in the COVID-19 positive cohort.
Sequentially eliminate co-triggered features that have a Spearman correlation of 0.8 or higher in the population.
After this step, we obtain a subset of M features, where all features in this subset have detection outputs that are less than 0.8 correlated with one another.

6.

Per subject detection of persistent change.

By marking persistent changes per-physiological parameter, a detection on a single subject basis was triggered based on the following criteria:

a) Count available features:

Determine the count F, which represents how many of the M features are available and non-missing for the user (either flagged as with persistent changes or not), accounting for potential compliance issues.

b) Post-hoc correction with a binomial distribution and detection:

Assuming the remaining features are independent (as ensured in Step 1), model the triggering of the F random features using a binomial distribution with a success probability of p = 0.05 (corresponding to the 5% false positive rate) (Eq. 2).

$$\:P\left(S|F,\:p\right)=\:\left(\genfrac{}{}{0pt}{}{F}{S}\right){p}^{S}{\left(1-p\right)}^{F-S}$$

(2)

$$\:P\left(X\le\:K|F,\:p\right)\le\:0.05$$

(3)

Or (Eq. 4):

$$\:{{K}_{0.05}=\:argmin}_{K\ge\:{K}_{mean}}\left|\left(\genfrac{}{}{0pt}{}{F}{S}\right){p}^{S}{\left(1-p\right)}^{F-S}-0.05\right|$$

(4)

Count S, the total number of features triggered for the user.
If S ≥ K, classify the user as exhibiting persistent physiological changes.
If S < K, assume no persistent change was detected, and interpret the triggered features as likely resulting from the expected 5% false positive rate when comparing with the Null distribution.

The flow chart in Fig. 7 shows the data-driven detection process for the candidates of persistent change.

In total, there were 782 physiological parameters that we extracted per day for every user. Due to data availability and user compliance, not all of those were available for every day from the baseline to the end of the study window time span. Considering physiological parameter availability (compliance) of 60% and above for the baseline and the study windows, physiological parameters with persistent changes post-acute infection during the study window for each user are identified. Among the physiological parameters showing persistent changes during the study window, we leveraged a statistical principle to classify the study window observation as either persistent or non-persistent change. We use a statistical approach to determine a threshold for classifying every user’s observation during the study window. This threshold is derived from a binomial distribution, which models the number of positive outcomes (i.e., physiological parameters with persistent changes) expected by random chance. Binomial distribution is employed to calculate a threshold that represents the number of physiological parameters with persistent changes needed to consider an observation significantly positive, given a predefined false positive rate of 5%. Given the probability mass function of the binomial distribution and $\:n$ as the number of identified persistent physiological parameters, $\:k$ is calculated so that chance reaches or exceeds 5% chance.

Symptom prevalence analysis

To test whether the “persistent-change” group identified purely from wearable data also showed a higher prevalence of self-reported symptoms, we compared weekly symptom prevalence between the COVID positive and COVID positive with persistent change groups during the five-week interval used for physiological assessment (study window, weeks 4–8 after the test date). Each daily survey captured the presence/absence of CDC-listed symptoms (Supplementary Note 6). For every participant and every week during the study window, we created a binary variable: symptom = 1 if ≥ 1 symptom was reported on any day of that week, otherwise 0. Thus, each participant contributed one independent observation per week. For each week, we constructed a 2 × 2 contingency table (group × symptom presence) and applied Fisher’s exact test (two-sided). Family-wise Type-I error across the weekly tests was controlled with Holm adjustment. We additionally pooled weeks 4–8 into a single table to obtain a more precise risk estimate for “any symptom during the study window”.

Monte Carlo analysis for comparison of prevalence rates

To evaluate whether the prevalence of persistent physiological changes observed in COVID-19 positive subjects was significantly different from what might be expected by chance, we performed a Monte Carlo simulation using COVID-19 negative cohorts as a reference.

To provide a comparison between the positive and negative cohorts, we conducted 1,000 Monte Carlo simulations using data from the COVID-19 negative cohort. Given N subjects in the COVID positive cohort, in each simulation, we randomly selected N subjects in the COVID negative cohort and applied the same compliance criteria (60% and more available nightly measurements per physiological parameter). By comparing the observed prevalence rate in the COVID-19 positive group to the distribution of rates obtained through these simulations, we were able to assess whether the observed rate of persistent changes was significantly elevated beyond what might be expected by random variation in the negative cohort.

Risk analysis using logistic regression

We performed a logistic regression to assess the association between various risk factors and persistent physiological changes in COVID-19 positive subjects. The model included sex, age group, vaccination status (none or 1 or more doses), and weight category as covariates. The outcome variable was the detection of subject with chronic persistent changes post-infection.

Source link