283 models available
MiniCPM-V 2.6 is OpenBMB's compact multimodal model supporting image and video understanding alongside text. Delivers strong visual reasoning and OCR capabilities at 8B parameter scale.
A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.
Gemma 3 12B is Google's mid-range Gemma 3 model with vision capabilities. Offers strong reasoning, code generation, and image understanding balanced with practical resource requirements.
Model type: LLaVA is an open-source chatbot trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: mistralai/Mistral-7B-Instruct-v0.2
Nemotron Nano 8B is NVIDIA's reasoning model derived from Llama 3.1 8B Instruct, post-trained for switchable reasoning with on/off modes. Achieves 95.4% on MATH-500 and 54.1% on GPQA Diamond with reasoning enabled. Fits on a single RTX GPU for local deployment.
*In the tapestry of Greek mythology, Hermes reigns as the eloquent Messenger of the Gods, a deity who deftly bridges the realms through the art of communication. It is in homage to this divine mediator that I name this advanced LLM "Hermes," a system crafted to navigate the complex intricacies of human discourse with celestial finesse.*
The Phi-3-Medium-128K-Instruct is a 14B parameters, lightweight, state-of-the-art open model trained with the Phi-3 datasets that includes both synthetic data and the filtered publicly available websites data with a focus on high-quality and reasoning dense properties. The model belongs to the Phi-3 family with the Medium version in two variants 4k and 128K which is the context length (in tokens) that it can support.
license: apache-2.0 language: - en pipeline_tag: image-text-to-text tags: - multimodal library_name: transformers