-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 43 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 7 -
Alignment for Honesty
Paper • 2312.07000 • Published • 14 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 14
lee
dolphinlee
AI & ML interests
LLM/DRL
Organizations
None yet
audio
Text-to-Image
-
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
Paper • 2311.11243 • Published • 16 -
Make Pixels Dance: High-Dynamic Video Generation
Paper • 2311.10982 • Published • 68 -
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
Paper • 2311.10794 • Published • 27 -
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18
VLM
-
Scalable Pre-training of Large Autoregressive Image Models
Paper • 2401.08541 • Published • 38 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 15 -
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10
llm
-
System 2 Attention (is something you might need too)
Paper • 2311.11829 • Published • 43 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 7 -
Alignment for Honesty
Paper • 2312.07000 • Published • 14 -
Steering Llama 2 via Contrastive Activation Addition
Paper • 2312.06681 • Published • 14
Text-to-Image
-
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort
Paper • 2311.11243 • Published • 16 -
Make Pixels Dance: High-Dynamic Video Generation
Paper • 2311.10982 • Published • 68 -
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
Paper • 2311.10794 • Published • 27 -
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Paper • 2311.12793 • Published • 18
audio
VLM
-
Scalable Pre-training of Large Autoregressive Image Models
Paper • 2401.08541 • Published • 38 -
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets
Paper • 2403.03194 • Published • 15 -
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10