Draft: User interaction without judge

The goal of this PR is to allow human feedback, even without a judge. Currently, we rely on a judge to rate model outputs, and then ask a human to correct an output if it was rated bad by the judge.

Without a judge, we ask a human to correct the worst prompts (i.e., each evolution step) after several iterations of non-improvement in a generation.

Edited by Max Kimmich

Merge request reports

Loading