Synthical
Your space
Profile
Activity
Favorites
Folders
Feeds
All articles
Claim page
Tomasz Korbak
Follow
Activity
Upvotes
Folders
Articles
24
The Two-Hop Curse: LLMs trained on A
\rightarrow
B, B
\rightarrow
C fail to learn A
\rightarrow
C
6 January 2025 by
Mikita Balesni
and
others
Computation and Language
,
Artificial Intelligence
Safety case template for frontier AI: A cyber inability argument
12 November 2024 by
Arthur Goemans
and
others
Computers and Society
,
Cryptography and Security
Towards evaluations-based safety cases for AI scheming
7 November 2024 by
Mikita Balesni
and
others
Cryptography and Security
,
Artificial Intelligence
Looking Inward: Language Models Can Learn About Themselves by Introspection
17 October 2024 by
Felix Binder
and
others
Computation and Language
,
Artificial Intelligence
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
6 September 2024 by
Usman Anwar
and
others
at
MIT
Machine Learning
,
Artificial Intelligence
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
26 May 2024 by
Lukas Berglund
and
others
Computation and Language
,
Artificial Intelligence
Inverse Scaling: When Bigger Isn't Better
1
13 May 2024 by
Ian Mckenzie
and
others
Computation and Language
,
Artificial Intelligence
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data
1
29 April 2024 by
Matthias Gerstgrasser
and
others
at
Stanford University
Machine Learning
,
Artificial Intelligence
Aligning language models with human preferences
18 April 2024 by
Tomasz Korbak
Machine Learning
,
Computation and Language
Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication
3 April 2024 by
Łukasz Kuciński
and
others
Machine Learning
,
Artificial Intelligence
Compositional preference models for aligning LMs
14 March 2024 by
Dongyoung Go
and
others
Computation and Language
,
Machine Learning
Improving Code Generation by Training with Natural Language Feedback
22 February 2024 by
Angelica Chen
and
others
Software Engineering
,
Artificial Intelligence
Training Language Models with Language Feedback at Scale
22 February 2024 by
Jérémy Scheurer
and
others
Computation and Language
,
Artificial Intelligence
Towards Understanding Sycophancy in Language Models
27 October 2023 by
Mrinank Sharma
and
others
Computation and Language
,
Artificial Intelligence
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
11 September 2023 by
Stephen Casper
and
others
Artificial Intelligence
,
Computation and Language
Taken out of context: On measuring situational awareness in LLMs
1 September 2023 by
Lukas Berglund
and
others
Computation and Language
,
Machine Learning
Pretraining Language Models with Human Preferences
14 June 2023 by
Tomasz Korbak
and
others
Computation and Language
,
Machine Learning
Aligning Language Models with Preferences through f-divergence Minimization
6 June 2023 by
Dongyoung Go
and
others
Computation and Language
,
Machine Learning
Models of symbol emergence in communication: a conceptual review and a guide for avoiding local minima
8 March 2023 by
Julian Zubek
and
others
Artificial Intelligence
,
Computation and Language
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
14 November 2022 by
Tomasz Korbak
and
others
Machine Learning
,
Computation and Language
RL with KL penalties is better viewed as Bayesian inference
21 October 2022 by
Tomasz Korbak
and
others
Machine Learning
Controlling Conditional Language Models without Catastrophic Forgetting
20 June 2022 by
Tomasz Korbak
and
others
Machine Learning
,
Computation and Language
A continuity of Markov blanket interpretations under the Free Energy Principle
18 January 2022 by
Anil Seth
and
others
Neurons and Cognition
Energy-Based Models for Code Generation under Compilability Constraints
9 June 2021 by
Tomasz Korbak
and
others
Machine Learning
,
Computation and Language
Measuring non-trivial compositionality in emergent communication
29 October 2020 by
Tomasz Korbak
and
others
Neural and Evolutionary Computing
,
Computation and Language
Developmentally motivated emergence of compositional communication via template transfer
4 October 2019 by
Tomasz Korbak
and
others
Machine Learning
,
Artificial Intelligence
Exploiting Unsupervised Pre-training and Automated Feature Engineering for Low-resource Hate Speech Detection in Polish
17 June 2019 by
Renard Korzeniowski
and
others
Computation and Language
,
Machine Learning
Fine-tuning Tree-LSTM for phrase-level sentiment classification on a Polish dependency treebank. Submission to PolEval task 2
3 November 2017 by
Tomasz Korbak
and
Paulina Żak
Computation and Language
This is an AI-generated summary
Key points
Topics
Machine Learning
Computation and Language
Artificial Intelligence
Computers and Society
Software Engineering
Multiagent Systems
Neural and Evolutionary Computing
Emerging Technologies
Neurons and Cognition