User interaction only after non improvement
This adds the following new behaviour to --judge-action=correct
:
- All generations are first run with
judge_action = "skip"
- If the best performance of any prompt did not improve in a generation, the generation is rerun with
judge_action = "correct"
to get the human feedback - If the generation still did not yield a better prompt, the algorithm continues as before.
The motivation for this change is as follows: Human feedback is prohibitively expensive, so we need to require it sparingly. As suggested by Thang a good strategy would be to only use the feedback if the algorithm could not autonomously improve the prompt by itself. Since we only know how good a prompt performed after a full evaluation, we need to perform the whole generation iteration until we know if we need to rerun the iteration with human feedback. ideally we would have an oracle that could tell us if human feedback will be necessary before running the whole process, but since no such function exists, we use this as a proxy.
Edited by Grießhaber Daniel