Synthical
Your space
Profile
Activity
Favorites
Folders
Feeds
All articles
Claim page
Andy Zou
Follow
Activity
Upvotes
Folders
Articles
3
Humanity's Last Exam
4 days ago by
Long Phan
and
others
Machine Learning
,
Artificial Intelligence
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents
5 days ago by
Maksym Andriushchenko
and
others
at
Carnegie Mellon University
Machine Learning
,
Artificial Intelligence
Representation Engineering: A Top-Down Approach to AI Transparency
3 March 2025 by
Andy Zou
and
others
at
Carnegie Mellon University
Machine Learning
,
Artificial Intelligence
Path-Consistency: Prefix Enhancement for Efficient Inference in LLM
2 March 2025 by
Jiace Zhu
and
others
Computation and Language
,
Artificial Intelligence
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective
20 February 2025 by
Yue Huang
and
others
Computers and Society
Tamper-Resistant Safeguards for Open-Weight LLMs
10 February 2025 by
Rishub Tamirisa
and
others
Machine Learning
,
Artificial Intelligence
Loss of STIM1 and STIM2 in salivary glands disrupts ANO1 function but does not induce Sjogren's disease
31 October 2024 by
G. -Y. Son
and
others
at
New York University
Physiology
NeuralMatrix: Compute the Entire Neural Networks with Linear Matrix Operations for Efficient Inference
20 August 2024 by
Ruiqi Sun
and
others
at
Shanghai Jiao Tong University
Machine Learning
,
Artificial Intelligence
Order of Compression: A Systematic and Optimal Sequence to Combinationally Compress CNN
17 August 2024 by
Yingtao Shen
and
others
at
Shanghai Jiao Tong University
Machine Learning
,
Computer Vision and Pattern Recognition
Improving Alignment and Robustness with Circuit Breakers
1
12 July 2024 by
Andy Zou
and
others
at
Carnegie Mellon University
Machine Learning
,
Artificial Intelligence
Lessons from the Trenches on Reproducible Evaluation of Language Models
29 May 2024 by
Stella Biderman
and
others
at
Sorbonne University
Computation and Language
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
1
15 May 2024 by
Nathaniel Li
and
others
at
UC Berkeley
Machine Learning
,
Artificial Intelligence
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
27 February 2024 by
Mantas Mazeika
and
others
at
Carnegie Mellon University
Machine Learning
,
Artificial Intelligence
Topics
We have not analyzed this profile yet, please check back later