| Zeng, Yang, Zhou, Tan et al. — AIR-Bench 2024 (ICLR 2025) |
AIR-Bench 2024 |
huggingface.co/datasets/stanford-crfm/air-bench-2024 |
| van Ede, Aghakhani, Spahn, Bortolameotti, Cova, Continella, van Steen, Peter, Kruegel, Vigna — DeepCASE (IEEE S&P 2022) |
DeepCASE |
github.com/ucsb-seclab/DeepCASE-Dataset |
| Wang, Chen, Pei, Xie, Kang et al. — DecodingTrust (NeurIPS 2023) |
DecodingTrust |
huggingface.co/datasets/AI-Secure/DecodingTrust |
| Wu, Chen, Corcoran, Sra, Singh — GraphEval36K (NAACL 2025) |
GraphEval36K |
grapheval36k.github.io/ |
| Xu, Jiang, Niu, Deng et al. — Magpie (ICLR 2025) |
Magpie |
huggingface.co/Magpie-Align |
| Sonya, Zou, Vasan, Kruegel, Vigna, Xu — One Size Doesn't Fit All (SECAI/ESORICS 2025) |
MABEL |
github.com/action-ai-institute/MABEL-dataset |
| Xu, Zhang, Chen, Xie, Kang et al. — MMDT (ICLR 2025) |
MMDT |
mmdecodingtrust.github.io/ |
| Kang, Chen, Xu, Zhang et al. — PolyGuard (NeurIPS 2025) |
PolyGuard |
huggingface.co/datasets/AI-Secure/PolyGuard |
| Jiang, Ma, Xu, Li, Ramasubramanian et al. — SOSBench (2025) |
SOSBench |
huggingface.co/datasets/SOSBench/SOSBench |
| Xu, Meza Soria, Tan, Roy, Agrawal, Poovendran, Panda — TOUCAN (ICLR 2026) |
TOUCAN |
github.com/TheAgentArk/Toucan |
| Chen, Chen, Singh, Sra — XplainLLM (EMNLP 2024) |
XplainLLM |
github.com/chen-zichen/xplainllm_dataset |