Use correct format for demonstration samples for evaluation and evolution (!8) · Merge requests · Grießhaber Daniel / evoprompt · GitLab

Snippets Groups Projects

Merged Max Kimmich requested to merge refactor-models into master 5 months ago

The goal is to use chat-style format for demonstration samples for both evaluation and evolution.

Todos:

demonstration data for evaluation
demonstration data for evolution
base prompt generation

Current results for prompt "Have your friend evaluate the movie they had just seen and provide a summary opinion (e.g. terrible, bad, okay, good, or great) to determine the sentiment of the movie review." on SST5 (dev set/test set) using 1 demonstration sample per class for comparison with reference implementation are:

AlpacaHfChat: "chavinlo/alpaca-native", no grammar: 56.5/50.7
HfChat: "meta-llama/Meta-Llama-3.1-8B-Instruct", no grammar: 53.5/56.24
LlamaChat: "QuantFactory/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct.Q8_0.gguf", no grammar: 56.5/55.7
LlamaChat: "MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF/Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf", no grammar: 56.5/56.52

Results on SST5 for Alpaca 7b ("chavinlo/alpaca-native") from the original paper are 49.91 or 52.26 depending on the table one refers to (table 1 and table 14 respectively, not sure where the difference is (both should report scores on the test set)).

The same for AG's News with prompt "Assess the entire concept of the news story and choose from the World, Sports, Business or Tech categories to categorize it into the correct category.":

AlpacaHfChat: "chavinlo/alpaca-native", no grammar: 73.5/72.33

Edited 4 months ago by Max Kimmich

Activity

Max Kimmich requested review from @griesshaber 5 months ago

requested review from @griesshaber
Max Kimmich assigned to @maximilian.kimmich 5 months ago

assigned to @maximilian.kimmich
Max Kimmich changed the description 5 months ago

changed the description
Max Kimmich changed the description 5 months ago

changed the description
Max Kimmich added 3 commits 5 months ago
added 3 commits

35575c68 - Use chat format for task evaluation demonstration samples

975b2fd8 - Increase 'max_tokens' for HfChat to reduce chance of exceeding token limit

f57cbe07 - Use chat format for evolution demonstration samples

Compare with previous version
Max Kimmich marked the checklist item demonstration data for evolution as completed 5 months ago

marked the checklist item demonstration data for evolution as completed
Max Kimmich marked the checklist item demonstration data for evolution as incomplete 5 months ago

marked the checklist item demonstration data for evolution as incomplete
Max Kimmich marked the checklist item demonstration data for evolution as completed 5 months ago

marked the checklist item demonstration data for evolution as completed
Max Kimmich added 2 commits 5 months ago
added 2 commits

5175ca2d - Fix crash which occurred when no prompt could be extracted from evolution

7e0ea34a - Adapt judge for new CoT model format

Compare with previous version
Max Kimmich added 1 commit 5 months ago
added 1 commit

3d76757d - Re-add generated base prompts

Compare with previous version
Max Kimmich added 2 commits 5 months ago
added 2 commits

06b73a3b - Rename prompt source to match defined literal

a5d48a9f - Allow to directly control temperature

Compare with previous version
Max Kimmich added 1 commit 5 months ago
added 1 commit

3931df17 - Generate more diverse names

Compare with previous version
Grießhaber Daniel @griesshaber started a thread on commit 3931df17 5 months ago

evoprompt/utils.py

         system_message=None,
         prompt=RUN_NAME_PROMPT,
         use_randomness=True,
         # a bit more randomness for the name is okay
         temperature=1.2,

Max Kimmich added 1 commit 5 months ago

added 1 commit

49a22b31 - Fix not showing all inputs in ResponseEditor

Compare with previous version

Max Kimmich marked the checklist item base prompt generation as completed 4 months ago

marked the checklist item base prompt generation as completed

Max Kimmich marked this merge request as ready 4 months ago

marked this merge request as ready

Max Kimmich mentioned in commit 4 months ago

mentioned in commit 5271cae4

Max Kimmich merged 4 months ago

merged

Please register or sign in to reply