Skip to content
Snippets Groups Projects

Refactor tasks and models and fix format for various models

Merged Max Kimmich requested to merge refactor-models into master

Refactor models and task to be able to adapt the prompt format depending on the model (sorry the branch name is a bit misleading – thought I will only refactor model in the beginning):

  • Model chat format is respected when building prompts
  • Only parameters important for the respective model are used for computing the cache key

There are still some things todo:

  • Test llama-cpp backend for format
  • Make sure that all models behave similar, i.e., that they have their own set of parameters

Current results for prompt "Have your friend evaluate the movie they had just seen and provide a summary opinion (e.g. terrible, bad, okay, good, or great) to determine the sentiment of the movie review." on SST5 (dev set/test set) for comparison with reference implementation are:

  • AlpacaHfChat: "chavinlo/alpaca-native", no grammar: 52/49.23
  • HfChat: "meta-llama/Meta-Llama-3.1-8B-Instruct", no grammar: 54.5/52.44
  • LlamaChat: "QuantFactory/Meta-Llama-3.1-8B-Instruct-GGUF/Meta-Llama-3.1-8B-Instruct.Q8_0.gguf", no grammar: 55.5/53.48
  • LlamaChat: "TheBloke/Llama-2-70B-Chat-GGUF/llama-2-70b-chat.Q4_K_M.gguf", no grammar: 49/47.15
  • LlamaChat: "MaziyarPanahi/Meta-Llama-3.1-70B-Instruct-GGUF/Meta-Llama-3.1-70B-Instruct.Q4_K_M.gguf", no grammar: 57/53.53

Results on SST5 for Alpaca 7b ("chavinlo/alpaca-native") from the original paper are 49.91 or 52.26 depending on the table one refers to (table 1 and table 14 respectively, not sure where the difference is (both should report scores on the test set)).

Edited by Max Kimmich

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading