Checking model status …

Calibrated Human-in-the-Loop Short-Answer Grading

A fine-tuned language model grades student responses and emits a temperature-scaled confidence score. High-confidence predictions are auto-graded; low-confidence ones are flagged for human review. Attribution highlights the answer tokens that most influenced the grade.

Examples:
0.300.99
Loading model & running inference — this may take a minute on first request …
Predicted Grade
Confidence
Token Attribution Gradient × Input — tokens most influential to this grade
Low
High attribution