Evidence

Study Overview

This analysis examines the academic performance of seven Grade 7 Mathematics cohorts across two examination periods. Two groups (T1 and T2) received the SLAM Labs intervention; five groups (C1–C5) served as controls. All groups sat the same mid-term exam in September 2025 and the same end-term exam in March 2026.

The exams were not standardised in a scientific manner, meaning direct comparison of absolute scores across time periods carries inherent limitations. This report therefore focuses on differential analysis — how the gap between treatment and control groups changed across time — as a more robust approach to detecting any intervention signal.

Raw Score Summary

The table below shows average scores for all seven groups at each time point, along with the change in score from mid-term to end-term.

Group

Type

Mid Term

End Term

Change

Role

Treatment

37.1%

38.8%

+1.7

SLAM Labs

Treatment

42.1%

40.0%

-2.1

SLAM Labs

Control

42.3%

39.2%

-3.1

Control

48.6%

43.7%

-4.9

Control

49.1%

43.5%

-5.6

Control

53.4%

47.1%

-6.3

Control

48.0%

40.7%

-7.3

Note: pp = percentage points. A universal score decline is observed across all groups, most likely reflecting a harder end-term paper, increased curriculum difficulty, or both.

The most notable feature of the raw data is that every group — treatment and control alike — declined from mid-term to end-term. T1 is the sole exception, gaining 1.7 percentage points. This universal decline means that comparing absolute score changes across time is unreliable: if the end-term exam was harder, all groups would fall regardless of the intervention, making the treatment groups appear to underperform when they may not be.

Differential Analysis

To account for potential variation in exam difficulty, we instead examine the gap between each treatment group and each control group at each time point. A positive 'change in gap' means the treatment group closed ground on that control — i.e., their relative performance improved, independent of overall score levels.

3.1 T1 vs Control Groups

T1 began the study with the lowest scores of all cohorts (37.1%). By end-term, T1 had closed the gap with every single control group.

Control

Mid-Term Gap

End-Term Gap

Change in Gap

Direction

-5.2pp

-0.4pp

+4.8pp

Closed

-11.5pp

-4.9pp

+6.6pp

Closed

-12.0pp

-4.7pp

+7.3pp

Closed

-16.3pp

-8.3pp

+8.0pp

Closed

-10.9pp

-1.9pp

+9.0pp

Closed

T1's gap improvements range from +4.8 pp (vs C1) to +9.0 pp (vs C5). The larger improvements against higher-performing controls are consistent with a scenario where those controls fell further on a harder end-term exam, while T1 held its ground.

3.2 T2 vs Control Groups

T2 began near the middle of the cohort distribution (42.1%). It similarly closed the gap with every control group, and in one case — against C1 — overtook it entirely by end-term.

Control

Mid-Term Gap

End-Term Gap

Change in Gap

Direction

-0.2pp

+0.8pp

+1.0pp

Closed /Overtook

-6.5pp

-3.7pp

+2.8pp

Closed

-7.0pp

-3.5pp

+3.5pp

Closed

-11.3pp

-7.1pp

+4.2pp

Closed

-5.9pp

-0.7pp

+5.2pp

Closed

T2's gap improvements range from +1.0 pp (vs C1, where it began essentially level) to +5.2 pp (vs C5). The pattern is directionally consistent with T1, though the magnitudes are smaller, which may reflect T2's higher starting position leaving less room for relative gain

Charts for the above

Key Findings

Universal consistency: All 10 treatment-control pairings show the gap moving in the same direction. Not one pairing shows a treatment group falling behind a control group relative to its starting position. This uniformity is the strongest signal in the dataset.
T2 overtook C1: T2 began the study 0.2 pp behind C1 and ended 0.8 pp ahead — the only instance in this dataset where a treatment group outscored a control group it had previously trailed.
T1 showed larger improvements than T2 across all pairings. T1 started lower and gained more ground, consistent with either a stronger intervention effect on lower-baseline students, or a floor/ceiling dynamic in the scoring distribution.
The improvement in differentials is more pronounced against higher-performing controls (C3, C4, C5) than lower ones (C1). This is expected if the end-term paper was harder — higher-performing groups had further to fall, widening the relative benefit for treatment groups.

Caveats & Limitations

Non-standardised exams: The mid-term and end-term were not scientifically calibrated for difficulty. If the end-term was simply harder, the relative gains seen here could partly reflect differential resilience to difficulty rather than the intervention itself.
Baseline imbalance: Treatment groups started lower than most control groups. Regression to the mean — the statistical tendency for low scorers to rise relative to high scorers over time — cannot be ruled out as a partial explanation.
No statistical testing: No significance tests were applied. The observed differences could fall within normal sampling variation, particularly if group sizes are small.
Unknown confounders: Group assignment method, class sizes, teacher quality, and socioeconomic composition are unknown and could independently influence results.

Conclusion

The differential analysis presents a consistent and directionally clear picture: both treatment groups improved their relative standing against every control group between mid-term and end-term, across all 10 pairings. This pattern is difficult to explain through exam difficulty alone, since a uniformly harder paper would affect all groups proportionally.

However, this analysis falls short of demonstrating causal effectiveness. The most defensible interpretation is that the SLAM Labs intervention is a promising candidate for further, more rigorous study — ideally involving standardised assessments, randomised group assignment, and a formal Difference-in-Differences statistical framework to control for baseline differences.

Analysis prepared based on cohort average scores provided for the September 2025 mid-term and March 2026 end-term examinations.