Evidence

  1. Study Overview

This analysis examines the academic performance of seven Grade 7 Mathematics cohorts across two examination periods. Two groups (T1 and T2) received the SLAM Labs intervention; five groups (C1–C5) served as controls. All groups sat the same mid-term exam in September 2025 and the same end-term exam in March 2026.

The exams were not standardised in a scientific manner, meaning direct comparison of absolute scores across time periods carries inherent limitations. This report therefore focuses on differential analysis — how the gap between treatment and control groups changed across time — as a more robust approach to detecting any intervention signal.

  1. Raw Score Summary

The table below shows average scores for all seven groups at each time point, along with the change in score from mid-term to end-term.

Group

Type

Mid Term

End Term

Change

Role

T1

Treatment

37.1%

38.8%

+1.7

SLAM Labs

T2

Treatment

42.1%

40.0%

-2.1

SLAM Labs

C1

Control

42.3%

39.2%

-3.1

-

C2

Control

48.6%

43.7%

-4.9

-

C3

Control

49.1%

43.5%

-5.6

-

C4

Control

53.4%

47.1%

-6.3

-

C5

Control

48.0%

40.7%

-7.3

-

Note: pp = percentage points. A universal score decline is observed across all groups, most likely reflecting a harder end-term paper, increased curriculum difficulty, or both.

The most notable feature of the raw data is that every group — treatment and control alike — declined from mid-term to end-term. T1 is the sole exception, gaining 1.7 percentage points. This universal decline means that comparing absolute score changes across time is unreliable: if the end-term exam was harder, all groups would fall regardless of the intervention, making the treatment groups appear to underperform when they may not be.

  1. Differential Analysis

To account for potential variation in exam difficulty, we instead examine the gap between each treatment group and each control group at each time point. A positive 'change in gap' means the treatment group closed ground on that control — i.e., their relative performance improved, independent of overall score levels.

3.1 T1 vs Control Groups

T1 began the study with the lowest scores of all cohorts (37.1%). By end-term, T1 had closed the gap with every single control group.

Control

Mid-Term Gap

End-Term Gap

Change in Gap

Direction

C1

-5.2pp

-0.4pp

+4.8pp

Closed

C2

-11.5pp

-4.9pp

+6.6pp

Closed

C3

-12.0pp

-4.7pp

+7.3pp

Closed

C4

-16.3pp

-8.3pp

+8.0pp

Closed

C5

-10.9pp

-1.9pp

+9.0pp

Closed

T1's gap improvements range from +4.8 pp (vs C1) to +9.0 pp (vs C5). The larger improvements against higher-performing controls are consistent with a scenario where those controls fell further on a harder end-term exam, while T1 held its ground.

3.2 T2 vs Control Groups

T2 began near the middle of the cohort distribution (42.1%). It similarly closed the gap with every control group, and in one case — against C1 — overtook it entirely by end-term.

Control

Mid-Term Gap

End-Term Gap

Change in Gap

Direction

C1

-0.2pp

+0.8pp

+1.0pp

Closed /Overtook

C2

-6.5pp

-3.7pp

+2.8pp

Closed

C3

-7.0pp

-3.5pp

+3.5pp

Closed

C4

-11.3pp

-7.1pp

+4.2pp

Closed

C5

-5.9pp

-0.7pp

+5.2pp

Closed

T2's gap improvements range from +1.0 pp (vs C1, where it began essentially level) to +5.2 pp (vs C5). The pattern is directionally consistent with T1, though the magnitudes are smaller, which may reflect T2's higher starting position leaving less room for relative gain

Charts for the above

  1. Key Findings

  • Universal consistency: All 10 treatment-control pairings show the gap moving in the same direction. Not one pairing shows a treatment group falling behind a control group relative to its starting position. This uniformity is the strongest signal in the dataset.

  • T2 overtook C1: T2 began the study 0.2 pp behind C1 and ended 0.8 pp ahead — the only instance in this dataset where a treatment group outscored a control group it had previously trailed.

  • T1 showed larger improvements than T2 across all pairings. T1 started lower and gained more ground, consistent with either a stronger intervention effect on lower-baseline students, or a floor/ceiling dynamic in the scoring distribution.

  • The improvement in differentials is more pronounced against higher-performing controls (C3, C4, C5) than lower ones (C1). This is expected if the end-term paper was harder — higher-performing groups had further to fall, widening the relative benefit for treatment groups.

  1. Caveats & Limitations

  • Non-standardised exams: The mid-term and end-term were not scientifically calibrated for difficulty. If the end-term was simply harder, the relative gains seen here could partly reflect differential resilience to difficulty rather than the intervention itself.

  • Baseline imbalance: Treatment groups started lower than most control groups. Regression to the mean — the statistical tendency for low scorers to rise relative to high scorers over time — cannot be ruled out as a partial explanation.

  • No statistical testing: No significance tests were applied. The observed differences could fall within normal sampling variation, particularly if group sizes are small.

  • Unknown confounders: Group assignment method, class sizes, teacher quality, and socioeconomic composition are unknown and could independently influence results.

  1. Conclusion

The differential analysis presents a consistent and directionally clear picture: both treatment groups improved their relative standing against every control group between mid-term and end-term, across all 10 pairings. This pattern is difficult to explain through exam difficulty alone, since a uniformly harder paper would affect all groups proportionally.

However, this analysis falls short of demonstrating causal effectiveness. The most defensible interpretation is that the SLAM Labs intervention is a promising candidate for further, more rigorous study — ideally involving standardised assessments, randomised group assignment, and a formal Difference-in-Differences statistical framework to control for baseline differences.

Analysis prepared based on cohort average scores provided for the September 2025 mid-term and March 2026 end-term examinations.

SLAM Labs

Transforming math education through AI-powered personalized learning experiences.

© 2025 FuturEd Trust | SLAM Labs. All rights reserved.

© 2025 FuturEd Trust | SLAM Labs. All rights reserved.

Made with ❤️ for the future of education

Made with ❤️ for the future of education