Syco-bench: A Benchmark for LLM Sycophancy

Syco-bench is a four-part benchmark to evaluate how much models flatter and defer to their users. It consists of four tests: The charts below show the results for each model tested, with and without the system prompt used in the provider's web interface. Click on a chart to see the full size version.
Picking Sides Chart
Mirroring Chart
Attribution Bias Chart
Delusion Acceptance Chart

The results show substantial differences between models within each individual test. However, the relationships between the different tests are generally weak, suggesting either that each test captures a relatively independent aspect of sycophancy, or that some tests may not be well-aligned with our concept of sycophancy. The correlation matrix below summarizes the relationships between the four test scores across models.

Correlation Matrix of Test Scores
Contact: [email protected]