Synthical
Your space
Profile
Activity
Favorites
Folders
Feeds
All articles
Claim page
Guilherme Penedo
Follow
Activity
Upvotes
Folders
Articles
2
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
4 February 2025 by
Loubna Ben Allal
and
others
Computation and Language
Towards Best Practices for Open Datasets for LLM Training
14 January 2025 by
Stefan Baack
and
others
Computers and Society
,
Artificial Intelligence
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
25 June 2024 by
Guilherme Penedo
and
others
Computation and Language
The Falcon Series of Open Language Models
29 November 2023 by
Ebtesam Almazrouei
and
others
Computation and Language
,
Artificial Intelligence
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
1 June 2023 by
Guilherme Penedo
and
others
Computation and Language
,
Artificial Intelligence
This is an AI-generated summary
Key points
Topics
Computation and Language
Artificial Intelligence