Models
Creator | Model | Description | Access |
---|---|---|---|
J1-Jumbo v1 (178B) ai21/j1-jumbo | Jurassic-1 Jumbo (178B parameters) (docs, tech report). | limited | |
J1-Large v1 (7.5B) ai21/j1-large | Jurassic-1 Large (7.5B parameters) (docs, tech report). | limited | |
J1-Grande v1 (17B) ai21/j1-grande | Jurassic-1 Grande (17B parameters) with a "few tweaks" to the training process (docs, tech report). | limited | |
J1-Grande v2 beta (17B) ai21/j1-grande-v2-beta | Jurassic-1 Grande v2 beta (17B parameters) | limited | |
Jurassic-2 Jumbo (178B) ai21/j2-jumbo | Jurassic-2 Jumbo (178B parameters) (docs) | limited | |
Jurassic-2 Grande (17B) ai21/j2-grande | Jurassic-2 Grande (17B parameters) (docs) | limited | |
Jurassic-2 Large (7.5B) ai21/j2-large | Jurassic-2 Large (7.5B parameters) (docs) | limited | |
Aleph Alpha | Luminous Base (13B) AlephAlpha/luminous-base | Luminous Base (13B parameters) (docs | limited |
Aleph Alpha | Luminous Extended (30B) AlephAlpha/luminous-extended | Luminous Extended (30B parameters) (docs | limited |
Aleph Alpha | Luminous Supreme (70B) AlephAlpha/luminous-supreme | Luminous Supreme (70B parameters) (docs | limited |
Anthropic-LM v4-s3 (52B) anthropic/stanford-online-all-v4-s3 | A 52B parameter language model, trained using reinforcement learning from human feedback paper. | closed | |
Anthropic Claude v1.3 anthropic/claude-v1.3 | A model trained using reinforcement learning from human feedback (docs). | limited | |
Anthropic Claude Instant V1 anthropic/claude-instant-v1 | A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs). | limited | |
UC Berkeley | Koala (13B) together/koala-13b | Koala (13B) is a chatbot fine-tuned from Llama (13B) on dialogue data gathered from the web. (blog post) | open |
BLOOM (176B) together/bloom | BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages (paper). | open | |
BLOOMZ (176B) together/bloomz | BLOOMZ (176B parameters) is BLOOM that has been fine-tuned on natural language instructions (details). | open | |
T0pp (11B) together/t0pp | T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts (paper). | open | |
BigCode | SantaCoder (1.1B) huggingface/santacoder | SantaCoder (1.1B parameters) model trained on the Python, Java, and JavaScript subset of The Stack (v1.1) (model card). | open |
BigCode | StarCoder (15.5B) huggingface/starcoder | The StarCoder (15.5B parameter) model trained on 80+ programming languages from The Stack (v1.2) (model card). | open |
Cerebras | Cerebras GPT (6.7B) together/cerebras-gpt-6.7b | Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. (paper) | limited |
Cerebras | Cerebras GPT (13B) together/cerebras-gpt-13b | Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. (paper) | limited |
Cohere xlarge v20220609 (52.4B) cohere/xlarge-20220609 | Cohere xlarge v20220609 (52.4B parameters) | limited | |
Cohere large v20220720 (13.1B) cohere/large-20220720 | Cohere large v20220720 (13.1B parameters), which is deprecated by Cohere as of December 2, 2022. | limited | |
Cohere medium v20220720 (6.1B) cohere/medium-20220720 | Cohere medium v20220720 (6.1B parameters) | limited | |
Cohere small v20220720 (410M) cohere/small-20220720 | Cohere small v20220720 (410M parameters), which is deprecated by Cohere as of December 2, 2022. | limited | |
Cohere xlarge v20221108 (52.4B) cohere/xlarge-20221108 | Cohere xlarge v20221108 (52.4B parameters) | limited | |
Cohere medium v20221108 (6.1B) cohere/medium-20221108 | Cohere medium v20221108 (6.1B parameters) | limited | |
Cohere Command beta (6.1B) cohere/command-medium-beta | Cohere Command beta (6.1B parameters) is fine-tuned from the medium model to respond well with instruction-like prompts (details). | limited | |
Cohere Command beta (52.4B) cohere/command-xlarge-beta | Cohere Command beta (52.4B parameters) is fine-tuned from the XL model to respond well with instruction-like prompts (details). | limited | |
Databricks | Dolly V2 (3B) databricks/dolly-v2-3b | Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b. | open |
Databricks | Dolly V2 (7B) databricks/dolly-v2-7b | Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b. | open |
Databricks | Dolly V2 (12B) databricks/dolly-v2-12b | Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b. | open |
DeepMind | Gopher (280B) deepmind/gopher | Gopher (540B parameters) (paper). | closed |
DeepMind | Chinchilla (70B) deepmind/chinchilla | Chinchilla (70B parameters) (paper). | closed |
GPT-J (6B) together/gpt-j-6b | GPT-J (6B parameters) autoregressive language model trained on The Pile (details). | open | |
GPT-NeoX (20B) together/gpt-neox-20b | GPT-NeoX (20B parameters) autoregressive language model trained on The Pile (paper). | open | |
Pythia (3B) together/pythia-3b | Pythia (3B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. | open | |
Pythia (7B) together/pythia-7b | Pythia (7B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. | open | |
Pythia (12B) together/pythia-12b | Pythia (12B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers. | open | |
T5 (11B) together/t5-11b | T5 (11B parameters) is an encoder-decoder model trained on a multi-task mixture, where each task is converted into a text-to-text format (paper). | open | |
UL2 (20B) together/ul2 | UL2 (20B parameters) is an encoder-decoder model trained on the C4 corpus. It's similar to T5 but trained with a different objective and slightly different scaling knobs (paper). | open | |
Flan-T5 (11B) together/flan-t5-xxl | Flan-T5 (11B parameters) is T5 fine-tuned on 1.8K tasks (paper). | open | |
PaLM (540B) google/palm | Pathways Language Model (540B parameters) is trained using 6144 TPU v4 chips (paper). | closed | |
HazyResearch | H3 (2.7B) together/h3-2.7b | H3 (2.7B parameters) is a decoder-only language model based on state space models (paper). | open |
OPT-IML (175B) together/opt-iml-175b | OPT-IML (175B parameters) is a suite of decoder-only transformer LMs that are multi-task fine-tuned on 2000 datasets (paper). | open | |
OPT-IML (30B) together/opt-iml-30b | OPT-IML (30B parameters) is a suite of decoder-only transformer LMs that are multi-task fine-tuned on 2000 datasets (paper). | open | |
OPT (175B) together/opt-175b | Open Pre-trained Transformers (175B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper). | open | |
OPT (66B) together/opt-66b | Open Pre-trained Transformers (66B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper). | open | |
OPT (6.7B) together/opt-6.7b | Open Pre-trained Transformers (6.7B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper). | open | |
OPT (1.3B) together/opt-1.3b | Open Pre-trained Transformers (1.3B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper). | open | |
Galactica (120B) together/galactica-120b | Galactica (120B parameters) is trained on 48 million papers, textbooks, lectures notes, compounds and proteins, scientific websites, etc. (paper). | open | |
Galactica (30B) together/galactica-30b | Galactica (30B parameters) is trained on 48 million papers, textbooks, lectures notes, compounds and proteins, scientific websites, etc. (paper). | open | |
LLaMA (7B) huggingface/llama-7b | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. | open | |
Stanford | Alpaca (7B) huggingface/alpaca-7b | Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations. | open |
LLaMA (7B) together/llama-7b | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. | open | |
LLaMA (13B) together/llama-13b | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. | open | |
LLaMA (30B) together/llama-30b | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. | open | |
LLaMA (65B) together/llama-65b | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. | open | |
Stability AI | StableLM-Base-Alpha (7B) stabilityai/stablelm-base-alpha-7b | StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models. | open |
Stanford | Alpaca (7B) together/alpaca-7b | Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations | open |
Stanford | Alpaca (13B) together/alpaca-13b | Alpaca 13B is a model fine-tuned from the LLaMA 13B model on 52K instruction-following demonstrations | open |
Stanford | Alpaca (30B) together/alpaca-30b | Alpaca 30B is a model fine-tuned from the LLaMA 30B model on 52K instruction-following demonstrations | open |
LMSYS | Vicuna (13B) together/vicuna-13b | Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT. | open |
TNLG v2 (530B) microsoft/TNLGv2_530B | TNLG v2 (530B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper). | closed | |
TNLG v2 (6.7B) microsoft/TNLGv2_7B | TNLG v2 (6.7B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper). | closed | |
davinci (175B) openai/davinci | Original GPT-3 (175B parameters) autoregressive language model (paper, docs). | limited | |
curie (6.7B) openai/curie | Original GPT-3 (6.7B parameters) autoregressive language model (paper, docs). | limited | |
babbage (1.3B) openai/babbage | Original GPT-3 (1.3B parameters) autoregressive language model (paper, docs). | limited | |
ada (350M) openai/ada | Original GPT-3 (350M parameters) autoregressive language model (paper, docs). | limited | |
text-davinci-003 openai/text-davinci-003 | text-davinci-003 model that involves reinforcement learning (PPO) with reward models. Derived from text-davinci-002 (docs). | limited | |
text-davinci-002 openai/text-davinci-002 | text-davinci-002 model that involves supervised fine-tuning on human-written demonstrations. Derived from code-davinci-002 (docs). | limited | |
text-davinci-001 openai/text-davinci-001 | text-davinci-001 model that involves supervised fine-tuning on human-written demonstrations (docs). | limited | |
text-curie-001 openai/text-curie-001 | text-curie-001 model that involves supervised fine-tuning on human-written demonstrations (docs). | limited | |
text-babbage-001 openai/text-babbage-001 | text-babbage-001 model that involves supervised fine-tuning on human-written demonstrations (docs). | limited | |
text-ada-001 openai/text-ada-001 | text-ada-001 model that involves supervised fine-tuning on human-written demonstrations (docs). | limited | |
gpt-4-0314 openai/gpt-4-0314 | GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from March 14th 2023. | limited | |
gpt-4-32k-0314 openai/gpt-4-32k-0314 | GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from March 14th 2023. | limited | |
code-davinci-002 openai/code-davinci-002 | Codex-style model that is designed for pure code-completion tasks (docs). | limited | |
code-davinci-001 openai/code-davinci-001 | code-davinci-001 model | limited | |
code-cushman-001 (12B) openai/code-cushman-001 | Codex-style model that is a stronger, multilingual version of the Codex (12B) model in the Codex paper. | limited | |
gpt-3.5-turbo-0301 openai/gpt-3.5-turbo-0301 | Sibling model Sibling model of text-davinci-003 is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-03-01. | limited | |
gpt-3.5-turbo-0613 openai/gpt-3.5-turbo-0613 | Sibling model Sibling model of text-davinci-003 is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13. | limited | |
ChatGPT openai/chat-gpt | Sibling model to InstructGPT which interacts in a conversational way. See OpenAI's announcement. The size of the model is unknown. | limited | |
GPT-JT (6B) together/Together-gpt-JT-6B-v1 | GPT-JT (6B parameters) is a fork of GPT-J (blog post). | open | |
GPT-NeoXT-Chat-Base (20B) together/gpt-neoxt-chat-base-20b | GPT-NeoXT-Chat-Base (20B) is fine-tuned from GPT-NeoX, serving as a base model for developing open-source chatbots. | open | |
RedPajama-INCITE-Base-v1 (3B) together/redpajama-incite-base-3b-v1 | RedPajama-INCITE-Base-v1 (3B parameters) is a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible. | open | |
RedPajama-INCITE-Instruct-v1 (3B) together/redpajama-incite-instruct-3b-v1 | RedPajama-INCITE-Instruct-v1 (3B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible. | open | |
RedPajama-INCITE-Chat-v1 (3B) together/redpajama-incite-chat-3b-v1 | RedPajama-INCITE-Chat-v1 (3B parameters) is a model fine-tuned on OASST1 and Dolly2 to enhance chatting ability. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible. | open | |
RedPajama-INCITE-Base-v1 (7B) together/redpajama-incite-base-7b-v1 | RedPajama-INCITE-Base-v1 (7B parameters) is a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible. | open | |
MosaicML | MPT (7B) mosaicml/mpt-7b | MPT-7B is a Transformer trained from scratch on 1T tokens of text and code. | open |
MosaicML | MPT-Chat (7B) mosaicml/mpt-7b-chat | MPT-Chat (7B) is a chatbot-like model for dialogue generation. It is built by finetuning MPT-7B, a Transformer trained from scratch on 1T tokens of text and code. | open |
MosaicML | MPT-Instruct (7B) mosaicml/mpt-7b-instruct | MPT-Instruct (7B) is a model for short-form instruction following. It is built by finetuning MPT (7B), a Transformer trained from scratch on 1T tokens of text and code. | open |
CodeGen (16B) together/codegen | CodeGen (16B parameters) is an open dense code model trained for multi-turn program synthesis (blog). | open | |
GLM (130B) together/glm | GLM (130B parameters) is an open bilingual (English & Chinese) bidirectional dense model that was trained using General Language Model (GLM) procedure (paper). | open | |
CodeGeeX (13B) together/codegeex | CodeGeeX (13B parameters) is an open dense code model trained on more than 20 programming languages on a corpus of more than 850B tokens (blog). | open | |
Writer | Palmyra Base (5B) writer/palmyra-base | Palmyra Base (5B) | limited |
Writer | Palmyra Large (20B) writer/palmyra-large | Palmyra Large (20B) | limited |
Writer | InstructPalmyra (30B) writer/palmyra-instruct-30 | InstructPalmyra (30B) | limited |
Writer | Palmyra E (30B) writer/palmyra-e | Palmyra E (30B) | limited |
Writer | Silk Road (35B) writer/silk-road | Silk Road (35B) | limited |
Writer | Palmyra X (43B) writer/palmyra-x | Palmyra X (43B) | limited |
YaLM (100B) together/yalm | YaLM (100B parameters) is an autoregressive language model trained on English and Russian text ([GitHub](From https://github.com/yandex/YaLM-100B)). | open | |
Megatron GPT2 nvidia/megatron-gpt2 | GPT-2 implemented in Megatron-LM (paper). | open |