Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.04615

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Papers related to web agents, screen agents, autonomous agents that can mimic human users in computer and/or web use

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 25
Mind2Web: Towards a Generalist Agent for the Web

Paper • 2306.06070 • Published Jun 9, 2023 • 19
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Runtime error

Featured

453

OmniParser V2

🏢

453

OmniParser, turn your LLM into GUI agent
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
microsoft/Magma-8B

Robotics • 9B • Updated 28 days ago • 1.2k • 412
mlfoundations/Gelato-30B-A3B

Image-Text-to-Text • 31B • Updated Nov 15, 2025 • 187 • 28

Llammy3.2-3B-GUFF

prithivMLmods/Llama-Sentient-3.2-3B-Instruct

Text Generation • Updated Dec 10, 2024 • 18 • 9
bartendr604/Llama.Diffusion.Flix

Updated Apr 12, 2025 • 1
Running

1.42k

FLUX Unlimited

🔥

1.42k

Use the FLUX model as much as you want.
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23, 2025 • 14k • 94

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25, 2024 • 55
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 29
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Runtime error

Featured

453

OmniParser V2

🏢

453

OmniParser, turn your LLM into GUI agent
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
microsoft/Magma-8B

Robotics • 9B • Updated 28 days ago • 1.2k • 412
mlfoundations/Gelato-30B-A3B

Image-Text-to-Text • 31B • Updated Nov 15, 2025 • 187 • 28

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Llammy3.2-3B-GUFF

prithivMLmods/Llama-Sentient-3.2-3B-Instruct

Text Generation • Updated Dec 10, 2024 • 18 • 9
bartendr604/Llama.Diffusion.Flix

Updated Apr 12, 2025 • 1
Running

1.42k

FLUX Unlimited

🔥

1.42k

Use the FLUX model as much as you want.
HKUSTAudio/xcodec2

Audio-to-Audio • 0.8B • Updated Feb 23, 2025 • 14k • 94

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 109
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25, 2024 • 55
ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 101

Papers related to web agents, screen agents, autonomous agents that can mimic human users in computer and/or web use

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
WebArena: A Realistic Web Environment for Building Autonomous Agents

Paper • 2307.13854 • Published Jul 25, 2023 • 25
Mind2Web: Towards a Generalist Agent for the Web

Paper • 2306.06070 • Published Jun 9, 2023 • 19
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Paper • 2410.13232 • Published Oct 17, 2024 • 44

ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs