Vision - a melvindave Collection

melvindave 's Collections

Vision

Papers

Language Models (Reasoning)

Audio Transcription

Image Generation

Fine-tuning Models

Coding

Customer Conversations Datasets

Vision

updated Dec 10, 2025

Running on CPU Upgrade

978

Open VLM Leaderboard

🌎

978

VLMEvalKit Evaluation Results Collection
Running on Zero

Featured

380

DeepSeek OCR 2 Demo

🚀

380

Try out DeepSeek-OCR-2 on your PDFs or images
Running on Zero

MCP

60

Multimodal OCR3

🌖

60

nanonets2-ocr / chandra-ocr / dots.ocr / olm-ocr2
Qwen/Qwen3-VL-30B-A3B-Instruct

Image-Text-to-Text • 31B • Updated Nov 26, 2025 • 858k • • 521

Note running locally in lmstudio
Qwen/Qwen3-VL-235B-A22B-Thinking

Image-to-Text • 236B • Updated Nov 26, 2025 • 130k • 367

Note inference available
Qwen/Qwen3-VL-235B-A22B-Instruct

Image-to-Text • 236B • Updated Nov 26, 2025 • 235k • 361

Note inference available
Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Apr 6, 2025 • 3.14M • • 1.44k
zai-org/GLM-4.6V

Image-Text-to-Text • 108B • Updated Dec 9, 2025 • 53.8k • • 367
Running on Zero

Featured

113

VLM Object Understanding

🦀

113

Explore object detection, visual grounding, keypoint Detecti