Evidence
Study Overview
This analysis examines the academic performance of seven Grade 7 Mathematics cohorts across two examination periods. Two groups (T1 and T2) received the SLAM Labs intervention; five groups (C1–C5) served as controls. All groups sat the same mid-term exam in September 2025 and the same end-term exam in March 2026.
The exams were not standardised in a scientific manner, meaning direct comparison of absolute scores across time periods carries inherent limitations. This report therefore focuses on differential analysis — how the gap between treatment and control groups changed across time — as a more robust approach to detecting any intervention signal.
Raw Score Summary
The table below shows average scores for all seven groups at each time point, along with the change in score from mid-term to end-term.
Group
Type
Mid Term
End Term
Change
Role
T1
Treatment
37.1%
38.8%
+1.7
SLAM Labs
T2
Treatment
42.1%
40.0%
-2.1
SLAM Labs
C1
Control
42.3%
39.2%
-3.1
-
C2
Control
48.6%
43.7%
-4.9
-
C3
Control
49.1%
43.5%
-5.6
-
C4
Control
53.4%
47.1%
-6.3
-
C5
Control
48.0%
40.7%
-7.3
-
Note: pp = percentage points. A universal score decline is observed across all groups, most likely reflecting a harder end-term paper, increased curriculum difficulty, or both.
The most notable feature of the raw data is that every group — treatment and control alike — declined from mid-term to end-term. T1 is the sole exception, gaining 1.7 percentage points. This universal decline means that comparing absolute score changes across time is unreliable: if the end-term exam was harder, all groups would fall regardless of the intervention, making the treatment groups appear to underperform when they may not be.
Differential Analysis
To account for potential variation in exam difficulty, we instead examine the gap between each treatment group and each control group at each time point. A positive 'change in gap' means the treatment group closed ground on that control — i.e., their relative performance improved, independent of overall score levels.
3.1 T1 vs Control Groups
T1 began the study with the lowest scores of all cohorts (37.1%). By end-term, T1 had closed the gap with every single control group.
Control
Mid-Term Gap
End-Term Gap
Change in Gap
Direction
C1
-5.2pp
-0.4pp
+4.8pp
Closed
C2
-11.5pp
-4.9pp
+6.6pp
Closed
C3
-12.0pp
-4.7pp
+7.3pp
Closed
C4
-16.3pp
-8.3pp
+8.0pp
Closed
C5
-10.9pp
-1.9pp
+9.0pp
Closed
T1's gap improvements range from +4.8 pp (vs C1) to +9.0 pp (vs C5). The larger improvements against higher-performing controls are consistent with a scenario where those controls fell further on a harder end-term exam, while T1 held its ground.
3.2 T2 vs Control Groups
T2 began near the middle of the cohort distribution (42.1%). It similarly closed the gap with every control group, and in one case — against C1 — overtook it entirely by end-term.
Control
Mid-Term Gap
End-Term Gap
Change in Gap
Direction
C1
-0.2pp
+0.8pp
+1.0pp
Closed /Overtook
C2
-6.5pp
-3.7pp
+2.8pp
Closed
C3
-7.0pp
-3.5pp
+3.5pp
Closed
C4
-11.3pp
-7.1pp
+4.2pp
Closed
C5
-5.9pp
-0.7pp
+5.2pp
Closed
T2's gap improvements range from +1.0 pp (vs C1, where it began essentially level) to +5.2 pp (vs C5). The pattern is directionally consistent with T1, though the magnitudes are smaller, which may reflect T2's higher starting position leaving less room for relative gain
Charts for the above



Key Findings
Universal consistency: All 10 treatment-control pairings show the gap moving in the same direction. Not one pairing shows a treatment group falling behind a control group relative to its starting position. This uniformity is the strongest signal in the dataset.
T2 overtook C1: T2 began the study 0.2 pp behind C1 and ended 0.8 pp ahead — the only instance in this dataset where a treatment group outscored a control group it had previously trailed.
T1 showed larger improvements than T2 across all pairings. T1 started lower and gained more ground, consistent with either a stronger intervention effect on lower-baseline students, or a floor/ceiling dynamic in the scoring distribution.
The improvement in differentials is more pronounced against higher-performing controls (C3, C4, C5) than lower ones (C1). This is expected if the end-term paper was harder — higher-performing groups had further to fall, widening the relative benefit for treatment groups.
Caveats & Limitations
Non-standardised exams: The mid-term and end-term were not scientifically calibrated for difficulty. If the end-term was simply harder, the relative gains seen here could partly reflect differential resilience to difficulty rather than the intervention itself.
Baseline imbalance: Treatment groups started lower than most control groups. Regression to the mean — the statistical tendency for low scorers to rise relative to high scorers over time — cannot be ruled out as a partial explanation.
No statistical testing: No significance tests were applied. The observed differences could fall within normal sampling variation, particularly if group sizes are small.
Unknown confounders: Group assignment method, class sizes, teacher quality, and socioeconomic composition are unknown and could independently influence results.
Conclusion
The differential analysis presents a consistent and directionally clear picture: both treatment groups improved their relative standing against every control group between mid-term and end-term, across all 10 pairings. This pattern is difficult to explain through exam difficulty alone, since a uniformly harder paper would affect all groups proportionally.
However, this analysis falls short of demonstrating causal effectiveness. The most defensible interpretation is that the SLAM Labs intervention is a promising candidate for further, more rigorous study — ideally involving standardised assessments, randomised group assignment, and a formal Difference-in-Differences statistical framework to control for baseline differences.
Analysis prepared based on cohort average scores provided for the September 2025 mid-term and March 2026 end-term examinations.