Calibration aligns model scores with event frequencies. For a binary outcome $Y\in{0,1}$ and a score $S$, the calibration function is $g(s)=\mathbb{E}[Y\mid S=s]$. Post hoc calibration estimates $g$ on a holdout set and applies the estimate to future scores.
A common difficulty