Cell2Sentence: Understanding Single-Cell Biology With LLMs

Yale University and Cell2Sentence-Scale (C2S-Scale) announce a set of open-source large language models trained to understand biology at the single-cell level

By bridging the gap between biology and artificial intelligence, C2S-Scale transforms intricate cellular data into easily understood cell sentences

Trillions of cells make up each human, and each one has a specific purpose, such as constructing organs, battling infections, or transporting oxygen

In the “Scaling Large Language Models for Next-Generation Single-Cell Analysis” session, Google is excited to introduce Cell2Sentence-Scale (C2S-Scale), a set of robust, open-source LLMs that “read” and “write” biological data at the single-cell level

From characterizing the cell types of individual cells to producing summaries of entire tissues or experiments, Cell2Sentence-Scale can automatically provide biological summaries of scRNA-seq data at various levels of complexity

Google work’s main conclusion is that biological language models exhibit well-defined scaling rules, with performance improving predictably with increasing model size

Predicting a cell’s reaction to a perturbation, like as a medication, gene deletion, or cytokine exposure, is one of the most fascinating uses of Cell2Sentence-Scale

Google use comparable strategies to improve Cell2Sentence-Scale models for biological reasoning, just as reinforcement learning is used to fine-tune big language models like Gemini to follow instructions and respond in useful, human-aligned ways

Cell2Sentence materials and models are now accessible on websites like GitHub and HuggingFace