Skip to content
Snippets Groups Projects

Use correct format for demonstration samples for evaluation and evolution

Merged Max Kimmich requested to merge refactor-models into master

The goal is to use chat-style format for demonstration samples for both evaluation and evolution.

Todos:

  • demonstration data for evaluation
  • demonstration data for evolution
  • base prompt generation

Current results for prompt "Have your friend evaluate the movie they had just seen and provide a summary opinion (e.g. terrible, bad, okay, good, or great) to determine the sentiment of the movie review." on SST5 (dev set/test set) using 1 demonstration sample per class for comparison with reference implementation are:

  • AlpacaHfChat: "chavinlo/alpaca-native", no grammar: 56.5/50.7
  • HfChat: "meta-llama/Meta-Llama-3.1-8B-Instruct", no grammar: 53.5/56.24
  • LlamaChat: "QuantFactory/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct.Q8_0.gguf", no grammar: 56.5/55.7
  • LlamaChat: "MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF/Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf", no grammar: 56.5/56.52

Results on SST5 for Alpaca 7b ("chavinlo/alpaca-native") from the original paper are 49.91 or 52.26 depending on the table one refers to (table 1 and table 14 respectively, not sure where the difference is (both should report scores on the test set)).

The same for AG's News with prompt "Assess the entire concept of the news story and choose from the World, Sports, Business or Tech categories to categorize it into the correct category.":

  • AlpacaHfChat: "chavinlo/alpaca-native", no grammar: 73.5/72.33
Edited by Max Kimmich

Merge request reports

Approval is optional

Merged by Max KimmichMax Kimmich 4 months ago (Oct 23, 2024 8:39am UTC)

Merge details

  • Changes merged into master with 5271cae4.
  • Did not delete the source branch.

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
82 82 system_message=None,
83 83 prompt=RUN_NAME_PROMPT,
84 84 use_randomness=True,
85 # a bit more randomness for the name is okay
86 temperature=1.2,
  • Max Kimmich added 1 commit

    added 1 commit

    • 49a22b31 - Fix not showing all inputs in ResponseEditor

    Compare with previous version

  • Max Kimmich marked the checklist item base prompt generation as completed

    marked the checklist item base prompt generation as completed

  • Max Kimmich marked this merge request as ready

    marked this merge request as ready

  • Max Kimmich mentioned in commit 5271cae4

    mentioned in commit 5271cae4

  • merged

  • Please register or sign in to reply
    Loading