Models

CreatorModelDescriptionAccess
logo

J1-Jumbo v1 (178B)

ai21/j1-jumbo

Jurassic-1 Jumbo (178B parameters) (docs, tech report).

limited

logo

J1-Large v1 (7.5B)

ai21/j1-large

Jurassic-1 Large (7.5B parameters) (docs, tech report).

limited

logo

J1-Grande v1 (17B)

ai21/j1-grande

Jurassic-1 Grande (17B parameters) with a "few tweaks" to the training process (docs, tech report).

limited

logo

J1-Grande v2 beta (17B)

ai21/j1-grande-v2-beta

Jurassic-1 Grande v2 beta (17B parameters)

limited

logo

Jurassic-2 Jumbo (178B)

ai21/j2-jumbo

Jurassic-2 Jumbo (178B parameters) (docs)

limited

logo

Jurassic-2 Grande (17B)

ai21/j2-grande

Jurassic-2 Grande (17B parameters) (docs)

limited

logo

Jurassic-2 Large (7.5B)

ai21/j2-large

Jurassic-2 Large (7.5B parameters) (docs)

limited

Aleph Alpha

Luminous Base (13B)

AlephAlpha/luminous-base

Luminous Base (13B parameters) (docs

limited

Aleph Alpha

Luminous Extended (30B)

AlephAlpha/luminous-extended

Luminous Extended (30B parameters) (docs

limited

Aleph Alpha

Luminous Supreme (70B)

AlephAlpha/luminous-supreme

Luminous Supreme (70B parameters) (docs

limited

logo

Anthropic-LM v4-s3 (52B)

anthropic/stanford-online-all-v4-s3

A 52B parameter language model, trained using reinforcement learning from human feedback paper.

closed

logo

Anthropic Claude v1.3

anthropic/claude-v1.3

A model trained using reinforcement learning from human feedback (docs).

limited

logo

Anthropic Claude Instant V1

anthropic/claude-instant-v1

A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).

limited

UC Berkeley

Koala (13B)

together/koala-13b

Koala (13B) is a chatbot fine-tuned from Llama (13B) on dialogue data gathered from the web. (blog post)

open

logo

BLOOM (176B)

together/bloom

BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages (paper).

open

logo

BLOOMZ (176B)

together/bloomz

BLOOMZ (176B parameters) is BLOOM that has been fine-tuned on natural language instructions (details).

open

logo

T0pp (11B)

together/t0pp

T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts (paper).

open

BigCode

SantaCoder (1.1B)

huggingface/santacoder

SantaCoder (1.1B parameters) model trained on the Python, Java, and JavaScript subset of The Stack (v1.1) (model card).

open

BigCode

StarCoder (15.5B)

huggingface/starcoder

The StarCoder (15.5B parameter) model trained on 80+ programming languages from The Stack (v1.2) (model card).

open

Cerebras

Cerebras GPT (6.7B)

together/cerebras-gpt-6.7b

Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. (paper)

limited

Cerebras

Cerebras GPT (13B)

together/cerebras-gpt-13b

Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. (paper)

limited

logo

Cohere xlarge v20220609 (52.4B)

cohere/xlarge-20220609

Cohere xlarge v20220609 (52.4B parameters)

limited

logo

Cohere large v20220720 (13.1B)

cohere/large-20220720

Cohere large v20220720 (13.1B parameters), which is deprecated by Cohere as of December 2, 2022.

limited

logo

Cohere medium v20220720 (6.1B)

cohere/medium-20220720

Cohere medium v20220720 (6.1B parameters)

limited

logo

Cohere small v20220720 (410M)

cohere/small-20220720

Cohere small v20220720 (410M parameters), which is deprecated by Cohere as of December 2, 2022.

limited

logo

Cohere xlarge v20221108 (52.4B)

cohere/xlarge-20221108

Cohere xlarge v20221108 (52.4B parameters)

limited

logo

Cohere medium v20221108 (6.1B)

cohere/medium-20221108

Cohere medium v20221108 (6.1B parameters)

limited

logo

Cohere Command beta (6.1B)

cohere/command-medium-beta

Cohere Command beta (6.1B parameters) is fine-tuned from the medium model to respond well with instruction-like prompts (details).

limited

logo

Cohere Command beta (52.4B)

cohere/command-xlarge-beta

Cohere Command beta (52.4B parameters) is fine-tuned from the XL model to respond well with instruction-like prompts (details).

limited

Databricks

Dolly V2 (3B)

databricks/dolly-v2-3b

Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

open

Databricks

Dolly V2 (7B)

databricks/dolly-v2-7b

Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

open

Databricks

Dolly V2 (12B)

databricks/dolly-v2-12b

Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.

open

DeepMind

Gopher (280B)

deepmind/gopher

Gopher (540B parameters) (paper).

closed

DeepMind

Chinchilla (70B)

deepmind/chinchilla

Chinchilla (70B parameters) (paper).

closed

logo

GPT-J (6B)

together/gpt-j-6b

GPT-J (6B parameters) autoregressive language model trained on The Pile (details).

open

logo

GPT-NeoX (20B)

together/gpt-neox-20b

GPT-NeoX (20B parameters) autoregressive language model trained on The Pile (paper).

open

logo

Pythia (3B)

together/pythia-3b

Pythia (3B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

open

logo

Pythia (7B)

together/pythia-7b

Pythia (7B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

open

logo

Pythia (12B)

together/pythia-12b

Pythia (12B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.

open

logo

T5 (11B)

together/t5-11b

T5 (11B parameters) is an encoder-decoder model trained on a multi-task mixture, where each task is converted into a text-to-text format (paper).

open

logo

UL2 (20B)

together/ul2

UL2 (20B parameters) is an encoder-decoder model trained on the C4 corpus. It's similar to T5 but trained with a different objective and slightly different scaling knobs (paper).

open

logo

Flan-T5 (11B)

together/flan-t5-xxl

Flan-T5 (11B parameters) is T5 fine-tuned on 1.8K tasks (paper).

open

logo

PaLM (540B)

google/palm

Pathways Language Model (540B parameters) is trained using 6144 TPU v4 chips (paper).

closed

HazyResearch

H3 (2.7B)

together/h3-2.7b

H3 (2.7B parameters) is a decoder-only language model based on state space models (paper).

open

logo

OPT-IML (175B)

together/opt-iml-175b

OPT-IML (175B parameters) is a suite of decoder-only transformer LMs that are multi-task fine-tuned on 2000 datasets (paper).

open

logo

OPT-IML (30B)

together/opt-iml-30b

OPT-IML (30B parameters) is a suite of decoder-only transformer LMs that are multi-task fine-tuned on 2000 datasets (paper).

open

logo

OPT (175B)

together/opt-175b

Open Pre-trained Transformers (175B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).

open

logo

OPT (66B)

together/opt-66b

Open Pre-trained Transformers (66B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).

open

logo

OPT (6.7B)

together/opt-6.7b

Open Pre-trained Transformers (6.7B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).

open

logo

OPT (1.3B)

together/opt-1.3b

Open Pre-trained Transformers (1.3B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).

open

logo

Galactica (120B)

together/galactica-120b

Galactica (120B parameters) is trained on 48 million papers, textbooks, lectures notes, compounds and proteins, scientific websites, etc. (paper).

open

logo

Galactica (30B)

together/galactica-30b

Galactica (30B parameters) is trained on 48 million papers, textbooks, lectures notes, compounds and proteins, scientific websites, etc. (paper).

open

logo

LLaMA (7B)

huggingface/llama-7b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

open

Stanford

Alpaca (7B)

huggingface/alpaca-7b

Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations.

open

logo

LLaMA (7B)

together/llama-7b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

open

logo

LLaMA (13B)

together/llama-13b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

open

logo

LLaMA (30B)

together/llama-30b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

open

logo

LLaMA (65B)

together/llama-65b

LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.

open

Stability AI

StableLM-Base-Alpha (7B)

stabilityai/stablelm-base-alpha-7b

StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.

open

Stanford

Alpaca (7B)

together/alpaca-7b

Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations

open

Stanford

Alpaca (13B)

together/alpaca-13b

Alpaca 13B is a model fine-tuned from the LLaMA 13B model on 52K instruction-following demonstrations

open

Stanford

Alpaca (30B)

together/alpaca-30b

Alpaca 30B is a model fine-tuned from the LLaMA 30B model on 52K instruction-following demonstrations

open

LMSYS

Vicuna (13B)

together/vicuna-13b

Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.

open

logo

TNLG v2 (530B)

microsoft/TNLGv2_530B

TNLG v2 (530B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper).

closed

logo

TNLG v2 (6.7B)

microsoft/TNLGv2_7B

TNLG v2 (6.7B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper).

closed

logo

davinci (175B)

openai/davinci

Original GPT-3 (175B parameters) autoregressive language model (paper, docs).

limited

logo

curie (6.7B)

openai/curie

Original GPT-3 (6.7B parameters) autoregressive language model (paper, docs).

limited

logo

babbage (1.3B)

openai/babbage

Original GPT-3 (1.3B parameters) autoregressive language model (paper, docs).

limited

logo

ada (350M)

openai/ada

Original GPT-3 (350M parameters) autoregressive language model (paper, docs).

limited

logo

text-davinci-003

openai/text-davinci-003

text-davinci-003 model that involves reinforcement learning (PPO) with reward models. Derived from text-davinci-002 (docs).

limited

logo

text-davinci-002

openai/text-davinci-002

text-davinci-002 model that involves supervised fine-tuning on human-written demonstrations. Derived from code-davinci-002 (docs).

limited

logo

text-davinci-001

openai/text-davinci-001

text-davinci-001 model that involves supervised fine-tuning on human-written demonstrations (docs).

limited

logo

text-curie-001

openai/text-curie-001

text-curie-001 model that involves supervised fine-tuning on human-written demonstrations (docs).

limited

logo

text-babbage-001

openai/text-babbage-001

text-babbage-001 model that involves supervised fine-tuning on human-written demonstrations (docs).

limited

logo

text-ada-001

openai/text-ada-001

text-ada-001 model that involves supervised fine-tuning on human-written demonstrations (docs).

limited

logo

gpt-4-0314

openai/gpt-4-0314

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from March 14th 2023.

limited

logo

gpt-4-32k-0314

openai/gpt-4-32k-0314

GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from March 14th 2023.

limited

logo

code-davinci-002

openai/code-davinci-002

Codex-style model that is designed for pure code-completion tasks (docs).

limited

logo

code-davinci-001

openai/code-davinci-001

code-davinci-001 model

limited

logo

code-cushman-001 (12B)

openai/code-cushman-001

Codex-style model that is a stronger, multilingual version of the Codex (12B) model in the Codex paper.

limited

logo

gpt-3.5-turbo-0301

openai/gpt-3.5-turbo-0301

Sibling model Sibling model of text-davinci-003 is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-03-01.

limited

logo

gpt-3.5-turbo-0613

openai/gpt-3.5-turbo-0613

Sibling model Sibling model of text-davinci-003 is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13.

limited

logo

ChatGPT

openai/chat-gpt

Sibling model to InstructGPT which interacts in a conversational way. See OpenAI's announcement. The size of the model is unknown.

limited

logo

GPT-JT (6B)

together/Together-gpt-JT-6B-v1

GPT-JT (6B parameters) is a fork of GPT-J (blog post).

open

logo

GPT-NeoXT-Chat-Base (20B)

together/gpt-neoxt-chat-base-20b

GPT-NeoXT-Chat-Base (20B) is fine-tuned from GPT-NeoX, serving as a base model for developing open-source chatbots.

open

logo

RedPajama-INCITE-Base-v1 (3B)

together/redpajama-incite-base-3b-v1

RedPajama-INCITE-Base-v1 (3B parameters) is a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.

open

logo

RedPajama-INCITE-Instruct-v1 (3B)

together/redpajama-incite-instruct-3b-v1

RedPajama-INCITE-Instruct-v1 (3B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.

open

logo

RedPajama-INCITE-Chat-v1 (3B)

together/redpajama-incite-chat-3b-v1

RedPajama-INCITE-Chat-v1 (3B parameters) is a model fine-tuned on OASST1 and Dolly2 to enhance chatting ability. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.

open

logo

RedPajama-INCITE-Base-v1 (7B)

together/redpajama-incite-base-7b-v1

RedPajama-INCITE-Base-v1 (7B parameters) is a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.

open

MosaicML

MPT (7B)

mosaicml/mpt-7b

MPT-7B is a Transformer trained from scratch on 1T tokens of text and code.

open

MosaicML

MPT-Chat (7B)

mosaicml/mpt-7b-chat

MPT-Chat (7B) is a chatbot-like model for dialogue generation. It is built by finetuning MPT-7B, a Transformer trained from scratch on 1T tokens of text and code.

open

MosaicML

MPT-Instruct (7B)

mosaicml/mpt-7b-instruct

MPT-Instruct (7B) is a model for short-form instruction following. It is built by finetuning MPT (7B), a Transformer trained from scratch on 1T tokens of text and code.

open

logo

CodeGen (16B)

together/codegen

CodeGen (16B parameters) is an open dense code model trained for multi-turn program synthesis (blog).

open

logo

GLM (130B)

together/glm

GLM (130B parameters) is an open bilingual (English & Chinese) bidirectional dense model that was trained using General Language Model (GLM) procedure (paper).

open

logo

CodeGeeX (13B)

together/codegeex

CodeGeeX (13B parameters) is an open dense code model trained on more than 20 programming languages on a corpus of more than 850B tokens (blog).

open

Writer

Palmyra Base (5B)

writer/palmyra-base

Palmyra Base (5B)

limited

Writer

Palmyra Large (20B)

writer/palmyra-large

Palmyra Large (20B)

limited

Writer

InstructPalmyra (30B)

writer/palmyra-instruct-30

InstructPalmyra (30B)

limited

Writer

Palmyra E (30B)

writer/palmyra-e

Palmyra E (30B)

limited

Writer

Silk Road (35B)

writer/silk-road

Silk Road (35B)

limited

Writer

Palmyra X (43B)

writer/palmyra-x

Palmyra X (43B)

limited

logo

YaLM (100B)

together/yalm

YaLM (100B parameters) is an autoregressive language model trained on English and Russian text ([GitHub](From https://github.com/yandex/YaLM-100B)).

open

logo

Megatron GPT2

nvidia/megatron-gpt2

GPT-2 implemented in Megatron-LM (paper).

open