Draft: User interaction without judge
The goal of this PR is to allow human feedback, even without a judge. Currently, we rely on a judge to rate model outputs, and then ask a human to correct an output if it was rated bad by the judge.
Without a judge, we ask a human to correct the worst prompts (i.e., each evolution step) after several iterations of non-improvement in a generation.
Edited by Max Kimmich