Here's a framework for prompt optimization:
Defining Success: Metrics and Evaluation Criteria
Before collecting any data, establish what success looks like for your specific use case. Choose a primary metric that directly reflects business value—accuracy for classification, F1 for imbalanced datasets, BLEU/ROUGE for generation tasks, or custom domain-specific
deleted by creator