The management of sport-related concussion in adolescents has undergone one of the more consequential paradigm shifts in sports medicine. The historical standard — strict cognitive and physical rest, the so-called "cocoon therapy" — has been formally superseded. The 2017 Berlin Consensus and the 2022 Amsterdam Statement both endorse sub-symptom threshold aerobic exercise (SSTAE) as early active management, and the evidence behind this shift is principally anchored by Leddy et al.'s randomized controlled trials (JAMA Pediatrics, 2019; Lancet Child & Adolescent Health, 2021).
Those trials demonstrated that individualized aerobic exercise prescribed within 10 days of injury reduces recovery time and lowers the incidence of persistent post-concussion symptoms. The clinical community has responded: SSTAE is now a standard recommendation. Athletic trainers prescribe it. Guidelines endorse it. The field has moved on.
But every claim about the effectiveness of aerobic exercise rests on a shared assumption — that the patients in these trials actually performed the exercise as prescribed, and that the data collected can tell us whether they did.
This proposal asks whether that assumption has ever been examined.
In current RCT designs, the treatment dose — the exercise actually performed — is typically recorded by the same patient who reports the outcome. Adherence is measured by self-report diary. Recovery is measured by self-report symptom scale. The independence of the independent and dependent variables has not been verified.
This question did not originate from a theoretical framework. It emerged from sustained, direct contact with this literature — through a Critically Appraised Topic completed in 2025 that examined seven of the most-cited RCTs on sub-symptom threshold aerobic exercise for adolescent PPCS. Reading those studies carefully, a pattern surfaced that was difficult to ignore: the treatment dose and the clinical outcome were being measured by the same source, in the same way, with no independent verification between them. The discomfort that observation produced is what this proposal is attempting to formalize.
As international consensus guidelines now formally endorse SSTAE as first-line management, the methodological foundations of that endorsement warrant systematic examination — not to challenge the consensus, but to strengthen the evidence base it rests on.
This is not a minor methodological footnote. Chizuk et al. (2022) demonstrated that adherence during week 1 of exercise prescription independently predicted recovery time — a clinically meaningful finding. But that secondary analysis relied on heart rate monitors that malfunctioned for 18% of participants, forcing a return to subjective logs for that subset. Outside of that single study, the measurement basis for adherence claims across this literature has never been systematically evaluated.
A comprehensive multi-database search was conducted across 7 sources — PubMed, CINAHL, SPORTDiscus, Cochrane CENTRAL, APA PsycInfo, Web of Science, and ClinicalTrials.gov — with no date restriction, from database inception through May 2026. Search strings were developed in consultation with ISU Research Librarian Ben Bolin and refined iteratively across two versions.
Stage 1 used ASReview — an open-source AI-assisted screening tool developed at Utrecht University (Netherlands). ASReview uses active learning, a machine learning approach in which the algorithm learns from each labeling decision to progressively rank remaining records by relevance. This enables screening of large corpora with substantially reduced manual workload while maintaining high recall. A stopping rule of 150 consecutive non-relevant records (1.8% proportional threshold) was applied, consistent with Feil et al. (2023).[1] Stage 2 full-text PICO screening is pending.
Before any manual screening, an exploratory semantic analysis was conducted on all 541 Stage 1 records using a large language model pipeline. The goal was not to summarize what the literature says — that is the work of the systematic review itself — but to understand how the field is organized: which concepts cluster together, which occupy the center, and which sit at the margins.
The analysis was structured around four categories that correspond directly to the methodological anatomy of a randomized controlled trial. Every RCT must answer four questions: What was the intervention? How was the outcome measured? Was any objective verification used? And did patients actually do what was prescribed? These are not arbitrary categories — they are the four load-bearing elements of any trial that claims to establish a dose-response relationship. Mapping the literature against these four dimensions reveals not just what the field has studied, but where its methodological attention has — and has not — been directed.
The interactive map below shows the result. Use the category buttons to filter by dimension. The search box allows navigation of the full 1,815-node network.
Semantic network of 541 screened records. Nodes = extracted clinical concepts; edges = directional relationships (alleviates, causes, assesses, correlates_with, improves, worsens). Node size reflects corpus frequency. Exploratory; not a substitute for systematic data extraction.
Several structural features of this network are immediately apparent. The densest cluster centers on concussion and symptom severity, connected outward through aerobic exercise and active rehabilitation toward recovery outcomes. Heart rate — the primary dosing parameter in BCTT-based exercise prescription — appears as a satellite node: present in the network, connected to the intervention cluster, but not densely linked to the outcome reporting apparatus. The measurement infrastructure exists in this literature, but it occupies the periphery.
To understand how this structure has evolved over time, the same corpus was analyzed by publication year across four thematic categories. The chart below maps annual publication density from 2000 to 2026.
Bubble area proportional to annual paper count per thematic category. Exploratory analysis of 541 AI-screened records.
Aerobic intervention research was effectively absent from this literature before 2015, then accelerated sharply following the Berlin Consensus. Measurement methods discussion grew in parallel, modestly. The adherence category — which tracks whether patients actually performed the prescribed exercise — shows almost no signal across the entire 26-year window, including the years in which aerobic exercise became the dominant intervention paradigm.
The field built an evidence base for aerobic exercise without building a parallel evidence base for whether that exercise was performed. These two questions are not independent. The first cannot be answered confidently without the second having been asked.
Of the 141 papers in this corpus discussing aerobic exercise interventions, the chart below maps how adherence was addressed — from whether it was mentioned at all, to what measurement method was used, to whether any adherence-outcome relationship was reported.
Source: AI-assisted semantic analysis of 541 screened records. All counts verified against raw data. Exploratory purposes only.
| Level of analysis | n | % of aerobic papers | Method |
|---|---|---|---|
| Aerobic exercise papers (base) | 141 | 100% | baseline |
| Adherence discussed | 3 | 2.1% | mixed |
| — with objective measurement | 1 | 0.7% | HR monitor |
| — subjective only | 1 | 0.7% | self-report |
| — measurement not defined | 1 | 0.7% | not reported |
| Adherence–outcome link reported | 2 | 1.4% | — |
The two papers that report an adherence–outcome link are the same papers that currently provide the primary evidence that adherence matters clinically. Chizuk et al. (2022) found that adolescents completing at least two-thirds of their weekly prescription recovered significantly faster (median 12 vs. 21.5 days; p = 0.016). Wingerson et al. (2024) found that treatment efficacy was associated with significantly greater adherence to the 100-minute/week prescription (77% vs. 36%; p = .05).
Both findings are clinically important. Both rest on a measurement foundation that has not been systematically examined.
DeMatteo et al. (2024, JMIR Pediatrics) compared accelerometry-based adherence measurement against self-report in 139 youth with concussion and found no significant agreement between the two methods. By accelerometry, only 13% of participants were adherent to RTA Stage 1 — compared to higher self-reported rates. This is the only study in this corpus to have directly tested whether the measurement method changes the adherence conclusion. It found that it does.
The groundwork for this review is already in place. Stage 1 screening is complete, and the next step is to move carefully into Stage 2 full-text screening.
05 / Where This PointsThe pattern documented above did not emerge from a theoretical framework applied to the literature from the outside. It emerged from systematic contact with this literature — beginning with a Critically Appraised Topic that examined seven of the most-cited RCTs on sub-symptom threshold aerobic exercise for adolescent PPCS, and extending through a screening process that identified 541 relevant records from 8,364 candidates. The adherence gap appeared at both scales, consistently: at the level of individual trial design in the CAT, and at the level of the entire semantic architecture of the corpus in the bibliometric analysis.
What the literature needs, and does not currently have, is a systematic account of how adherence has been defined, measured, and reported across existing trials — and what the variation in those methodological choices implies for interpreting the evidence base. That account cannot be produced by adding another RCT. It requires synthesizing what already exists with explicit attention to measurement quality.
Stage 1 Screening (Completed with ASReview)
A broad relevance screening identified 541 records from 8,364 candidates using ASReview, an open-source AI-assisted tool with active learning. Stage 1 applied inclusive criteria to identify any concussion or mTBI study regardless of intervention type, recognizing that strict PICO standards would be applied during full-text review.
Stage 2 Inclusion Criteria (Pending Full-Text Screening)
Participants: Adolescents aged 12–18 years diagnosed with concussion or mild traumatic brain injury (mTBI). Mixed-age samples are eligible if adolescent data can be disaggregated or adolescents comprise ≥75% of the sample.
Intervention (Critical Criterion): Studies must include aerobic exercise as a primary intervention component. This includes sub-symptom threshold aerobic exercise (SSTAE), graded exercise protocols, treadmill or cycling-based interventions, and heart rate–monitored physical activity prescriptions. Studies are excluded if interventions include only occupational therapy, cognitive therapy, vestibular therapy, rest protocols, pharmacology, or education without aerobic exercise as a primary component.
Condition: Any concussion or mTBI study is eligible. Persistent post-concussion symptoms (PPCS ≥28 days) is a secondary screening criterion; studies are not excluded solely because "persistent" is not explicitly stated, but studies exclusively focused on the acute phase (<14 days) with no persistent symptoms subgroup are excluded.
Outcomes: Primary outcomes include symptom severity, recovery time, return to play or learn, adherence/compliance, and safety. Secondary outcomes include neurocognitive function, balance, quality of life, and mental health measures. Adherence data reporting is flagged for special attention.
Study Design: Randomized controlled trials and controlled trials are primary inclusions. Single-arm pilot and feasibility studies with pre-post data are included as supplementary (labeled separately). Reviews, editorials, case reports, animal studies, and protocol-only papers are excluded.
Decision Rules: Population match alone is insufficient—the intervention must contain aerobic exercise. When in doubt about the intervention component, studies are excluded. When in doubt about population age or symptom timing, studies are included for full-text assessment. Small sample size alone does not trigger exclusion.
Three questions structure this synthesis: How was adherence defined in each included study — or was it defined at all? What method was used to measure it — self-report log, heart rate monitor, accelerometry, or none? And was any relationship between adherence and clinical outcome reported, and if so, on what evidentiary basis? These are not questions about whether aerobic exercise works. They are questions about what the current evidence can actually support.
Stage 1 screening is complete (541 records identified; 538 entering Stage 2). Semantic analysis of the full corpus has been completed and validated. The CAT preceding this review provides a confirmed starting point: direct documentation of adherence measurement practices across the seven most-cited trials in this area. Stage 2 full-text screening is the critical next step, expected to yield 15–25 included studies for systematic data extraction.
If "higher adherence predicts faster recovery" — as the current best evidence suggests — then the reliability of adherence measurement is not a peripheral concern. It is the methodological foundation of the dose-response relationship on which clinical recommendations rest. The proposed review makes that foundation visible, examines it systematically, and produces the kind of methodological account the field needs before the next generation of trials is designed.
This systematic review is intended as a foundation, not an endpoint. If the evidence confirms that adherence measurement in this literature lacks the consistency and objectivity needed to support confident dose-response conclusions, the natural next question is what a more rigorous measurement framework would look like — and whether emerging objective assessment approaches might serve as a complementary verification layer alongside existing clinical tools, without adding burden to the clinicians and athletic trainers already managing these patients. That question belongs to future work. This review is the first step in making it worth asking.
The evidence base for this review is not the small number of papers that explicitly focus on adherence. It is the 15–25 aerobic exercise RCTs expected from Stage 2 screening, each of which will be examined for how adherence was defined, measured, and reported. "Not defined" and "not reported" are themselves data points — and in a methodological systematic review, they are arguably the most important ones. The pattern of methodological omissions across a body of trials carries as much evidentiary weight as the patterns of what was reported.
The analysis above is based on Stage 1 screened records and exploratory semantic extraction. Stage 2 full-text screening has not yet begun. The numbers and patterns presented here are preliminary.
"Based on what you've seen — what do you think is the most important unanswered question in this literature?"
Please select your name, then share your thoughts.