User interaction only after non improvement (!12) · Merge requests · Grießhaber Daniel / evoprompt

Grießhaber Daniel requested to merge user-interaction-after-non-improvement into master Jan 02, 2025

This adds the following new behaviour to --judge-action=correct:

All generations are first run with judge_action = "skip"
If the best performance of any prompt did not improve in a generation, the generation is rerun with judge_action = "correct" to get the human feedback
If the generation still did not yield a better prompt, the algorithm continues as before.

The motivation for this change is as follows: Human feedback is prohibitively expensive, so we need to require it sparingly. As suggested by Thang a good strategy would be to only use the feedback if the algorithm could not autonomously improve the prompt by itself. Since we only know how good a prompt performed after a full evaluation, we need to perform the whole generation iteration until we know if we need to rerun the iteration with human feedback. ideally we would have an oracle that could tell us if human feedback will be necessary before running the whole process, but since no such function exists, we use this as a proxy.

Edited Jan 02, 2025 by Grießhaber Daniel

User interaction only after non improvement

Merge request reports