Dataset | | | | BBH | | GSM8K | | MMLU |
---|
Rank | Model (#Params) | Average Relative Drop (%) | | Original | Attacked | Relative Drop (%) | | Original | Attacked | Relative Drop (%) | | Original | Attacked | Relative Drop (%) |
---|
1 | Mistral-8X7B (47B) | 11.9% | | 65.6 | 58.3 (↓7.3) | 11.1% | | 68.5 | 57.9 (↓10.6) | 15.5% | | 68.4 | 62.1 (↓6.3) | 9.2% |
2 | Vicuna-13b (13B) | 17.2% | | 51.2 | 40.8 (↓10.4) | 20.4% | | 33.4 | 26.2 (↓7.2) | 21.6% | | 53.4 | 48.2 (↓5.2) | 9.7% |
3 | Gemma-7b (8.5B) | 19.1% | | 42.4 | 33.5 (↓8.9) | 21.0% | | 39.9 | 29.8 (↓10.1) | 25.3% | | 53.5 | 47.6 (↓5.9) | 11.0% |
4 | Vicuna-33b (33B) | 20.9% | | 52.1 | 42.5 (↓9.6) | 18.5% | | 38.2 | 26.4 (↓11.8) | 30.9% | | 59.2 | 51.4 (↓7.8) | 13.2% |
5 | Mistral-7b (7.2B) | 25.9% | | 50.0 | 39.1 (↓10.9) | 21.8% | | 43.7 | 27.1 (↓16.6) | 38.0% | | 54.6 | 44.8 (↓9.8) | 17.9% |
6 | Llama2-7b (6.7B) | 29.6% | | 35.7 | 26.8 (↓8.9) | 24.9% | | 27.3 | 14.7 (↓12.6) | 46.2% | | 35.1 | 28.9 (↓6.2) | 17.7% |
7 | Gemma-2b (2.5B) | 34.7% | | 29.6 | 20.2 (↓9.4) | 31.8% | | 15.1 | 7.1 (↓8.0) | 53.0% | | 34.1 | 27.5 (↓6.6) | 19.4% |