Automation can be benchmarked using confidence scores, which indicate the model's certainty about its predictions. By setting confidence thresholds, we can measure the proportion of data that a model can accurately handle without human intervention. This approach helps objectively compare the performance of different models in terms of their automation capability.