Models
| Creator | Model | Description | Access | 
|---|---|---|---|
![]()  | J1-Jumbo v1 (178B) ai21/j1-jumbo  | Jurassic-1 Jumbo (178B parameters) (docs, tech report).  | limited  | 
![]()  | J1-Large v1 (7.5B) ai21/j1-large  | Jurassic-1 Large (7.5B parameters) (docs, tech report).  | limited  | 
![]()  | J1-Grande v1 (17B) ai21/j1-grande  | Jurassic-1 Grande (17B parameters) with a "few tweaks" to the training process (docs, tech report).  | limited  | 
![]()  | J1-Grande v2 beta (17B) ai21/j1-grande-v2-beta  | Jurassic-1 Grande v2 beta (17B parameters)  | limited  | 
![]()  | Jurassic-2 Jumbo (178B) ai21/j2-jumbo  | Jurassic-2 Jumbo (178B parameters) (docs)  | limited  | 
![]()  | Jurassic-2 Grande (17B) ai21/j2-grande  | Jurassic-2 Grande (17B parameters) (docs)  | limited  | 
![]()  | Jurassic-2 Large (7.5B) ai21/j2-large  | Jurassic-2 Large (7.5B parameters) (docs)  | limited  | 
| Aleph Alpha | Luminous Base (13B) AlephAlpha/luminous-base  | Luminous Base (13B parameters) (docs  | limited  | 
| Aleph Alpha | Luminous Extended (30B) AlephAlpha/luminous-extended  | Luminous Extended (30B parameters) (docs  | limited  | 
| Aleph Alpha | Luminous Supreme (70B) AlephAlpha/luminous-supreme  | Luminous Supreme (70B parameters) (docs  | limited  | 
![]()  | Anthropic-LM v4-s3 (52B) anthropic/stanford-online-all-v4-s3  | A 52B parameter language model, trained using reinforcement learning from human feedback paper.  | closed  | 
![]()  | Anthropic Claude v1.3 anthropic/claude-v1.3  | A model trained using reinforcement learning from human feedback (docs).  | limited  | 
![]()  | Anthropic Claude Instant V1 anthropic/claude-instant-v1  | A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).  | limited  | 
| UC Berkeley | Koala (13B) together/koala-13b  | Koala (13B) is a chatbot fine-tuned from Llama (13B) on dialogue data gathered from the web. (blog post)  | open  | 
![]()  | BLOOM (176B) together/bloom  | BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages (paper).  | open  | 
![]()  | BLOOMZ (176B) together/bloomz  | BLOOMZ (176B parameters) is BLOOM that has been fine-tuned on natural language instructions (details).  | open  | 
![]()  | T0pp (11B) together/t0pp  | T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts (paper).  | open  | 
| BigCode | SantaCoder (1.1B) huggingface/santacoder  | SantaCoder (1.1B parameters) model trained on the Python, Java, and JavaScript subset of The Stack (v1.1) (model card).  | open  | 
| BigCode | StarCoder (15.5B) huggingface/starcoder  | The StarCoder (15.5B parameter) model trained on 80+ programming languages from The Stack (v1.2) (model card).  | open  | 
| Cerebras | Cerebras GPT (6.7B) together/cerebras-gpt-6.7b  | Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. (paper)  | limited  | 
| Cerebras | Cerebras GPT (13B) together/cerebras-gpt-13b  | Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. (paper)  | limited  | 
![]()  | Cohere xlarge v20220609 (52.4B) cohere/xlarge-20220609  | Cohere xlarge v20220609 (52.4B parameters)  | limited  | 
![]()  | Cohere large v20220720 (13.1B) cohere/large-20220720  | Cohere large v20220720 (13.1B parameters), which is deprecated by Cohere as of December 2, 2022.  | limited  | 
![]()  | Cohere medium v20220720 (6.1B) cohere/medium-20220720  | Cohere medium v20220720 (6.1B parameters)  | limited  | 
![]()  | Cohere small v20220720 (410M) cohere/small-20220720  | Cohere small v20220720 (410M parameters), which is deprecated by Cohere as of December 2, 2022.  | limited  | 
![]()  | Cohere xlarge v20221108 (52.4B) cohere/xlarge-20221108  | Cohere xlarge v20221108 (52.4B parameters)  | limited  | 
![]()  | Cohere medium v20221108 (6.1B) cohere/medium-20221108  | Cohere medium v20221108 (6.1B parameters)  | limited  | 
![]()  | Cohere Command beta (6.1B) cohere/command-medium-beta  | Cohere Command beta (6.1B parameters) is fine-tuned from the medium model to respond well with instruction-like prompts (details).  | limited  | 
![]()  | Cohere Command beta (52.4B) cohere/command-xlarge-beta  | Cohere Command beta (52.4B parameters) is fine-tuned from the XL model to respond well with instruction-like prompts (details).  | limited  | 
| Databricks | Dolly V2 (3B) databricks/dolly-v2-3b  | Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.  | open  | 
| Databricks | Dolly V2 (7B) databricks/dolly-v2-7b  | Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.  | open  | 
| Databricks | Dolly V2 (12B) databricks/dolly-v2-12b  | Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.  | open  | 
| DeepMind | Gopher (280B) deepmind/gopher  | Gopher (540B parameters) (paper).  | closed  | 
| DeepMind | Chinchilla (70B) deepmind/chinchilla  | Chinchilla (70B parameters) (paper).  | closed  | 
![]()  | GPT-J (6B) together/gpt-j-6b  | GPT-J (6B parameters) autoregressive language model trained on The Pile (details).  | open  | 
![]()  | GPT-NeoX (20B) together/gpt-neox-20b  | GPT-NeoX (20B parameters) autoregressive language model trained on The Pile (paper).  | open  | 
![]()  | Pythia (3B) together/pythia-3b  | Pythia (3B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.  | open  | 
![]()  | Pythia (7B) together/pythia-7b  | Pythia (7B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.  | open  | 
![]()  | Pythia (12B) together/pythia-12b  | Pythia (12B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.  | open  | 
![]()  | T5 (11B) together/t5-11b  | T5 (11B parameters) is an encoder-decoder model trained on a multi-task mixture, where each task is converted into a text-to-text format (paper).  | open  | 
![]()  | UL2 (20B) together/ul2  | UL2 (20B parameters) is an encoder-decoder model trained on the C4 corpus. It's similar to T5 but trained with a different objective and slightly different scaling knobs (paper).  | open  | 
![]()  | Flan-T5 (11B) together/flan-t5-xxl  | Flan-T5 (11B parameters) is T5 fine-tuned on 1.8K tasks (paper).  | open  | 
![]()  | PaLM (540B) google/palm  | Pathways Language Model (540B parameters) is trained using 6144 TPU v4 chips (paper).  | closed  | 
| HazyResearch | H3 (2.7B) together/h3-2.7b  | H3 (2.7B parameters) is a decoder-only language model based on state space models (paper).  | open  | 
![]()  | OPT-IML (175B) together/opt-iml-175b  | OPT-IML (175B parameters) is a suite of decoder-only transformer LMs that are multi-task fine-tuned on 2000 datasets (paper).  | open  | 
![]()  | OPT-IML (30B) together/opt-iml-30b  | OPT-IML (30B parameters) is a suite of decoder-only transformer LMs that are multi-task fine-tuned on 2000 datasets (paper).  | open  | 
![]()  | OPT (175B) together/opt-175b  | Open Pre-trained Transformers (175B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).  | open  | 
![]()  | OPT (66B) together/opt-66b  | Open Pre-trained Transformers (66B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).  | open  | 
![]()  | OPT (6.7B) together/opt-6.7b  | Open Pre-trained Transformers (6.7B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).  | open  | 
![]()  | OPT (1.3B) together/opt-1.3b  | Open Pre-trained Transformers (1.3B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).  | open  | 
![]()  | Galactica (120B) together/galactica-120b  | Galactica (120B parameters) is trained on 48 million papers, textbooks, lectures notes, compounds and proteins, scientific websites, etc. (paper).  | open  | 
![]()  | Galactica (30B) together/galactica-30b  | Galactica (30B parameters) is trained on 48 million papers, textbooks, lectures notes, compounds and proteins, scientific websites, etc. (paper).  | open  | 
![]()  | LLaMA (7B) huggingface/llama-7b  | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.  | open  | 
| Stanford | Alpaca (7B) huggingface/alpaca-7b  | Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations.  | open  | 
![]()  | LLaMA (7B) together/llama-7b  | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.  | open  | 
![]()  | LLaMA (13B) together/llama-13b  | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.  | open  | 
![]()  | LLaMA (30B) together/llama-30b  | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.  | open  | 
![]()  | LLaMA (65B) together/llama-65b  | LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.  | open  | 
| Stability AI | StableLM-Base-Alpha (7B) stabilityai/stablelm-base-alpha-7b  | StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.  | open  | 
| Stanford | Alpaca (7B) together/alpaca-7b  | Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations  | open  | 
| Stanford | Alpaca (13B) together/alpaca-13b  | Alpaca 13B is a model fine-tuned from the LLaMA 13B model on 52K instruction-following demonstrations  | open  | 
| Stanford | Alpaca (30B) together/alpaca-30b  | Alpaca 30B is a model fine-tuned from the LLaMA 30B model on 52K instruction-following demonstrations  | open  | 
| LMSYS | Vicuna (13B) together/vicuna-13b  | Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.  | open  | 
![]()  | TNLG v2 (530B) microsoft/TNLGv2_530B  | TNLG v2 (530B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper).  | closed  | 
![]()  | TNLG v2 (6.7B) microsoft/TNLGv2_7B  | TNLG v2 (6.7B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper).  | closed  | 
![]()  | davinci (175B) openai/davinci  | Original GPT-3 (175B parameters) autoregressive language model (paper, docs).  | limited  | 
![]()  | curie (6.7B) openai/curie  | Original GPT-3 (6.7B parameters) autoregressive language model (paper, docs).  | limited  | 
![]()  | babbage (1.3B) openai/babbage  | Original GPT-3 (1.3B parameters) autoregressive language model (paper, docs).  | limited  | 
![]()  | ada (350M) openai/ada  | Original GPT-3 (350M parameters) autoregressive language model (paper, docs).  | limited  | 
![]()  | text-davinci-003 openai/text-davinci-003  | text-davinci-003 model that involves reinforcement learning (PPO) with reward models. Derived from text-davinci-002 (docs).  | limited  | 
![]()  | text-davinci-002 openai/text-davinci-002  | text-davinci-002 model that involves supervised fine-tuning on human-written demonstrations. Derived from code-davinci-002 (docs).  | limited  | 
![]()  | text-davinci-001 openai/text-davinci-001  | text-davinci-001 model that involves supervised fine-tuning on human-written demonstrations (docs).  | limited  | 
![]()  | text-curie-001 openai/text-curie-001  | text-curie-001 model that involves supervised fine-tuning on human-written demonstrations (docs).  | limited  | 
![]()  | text-babbage-001 openai/text-babbage-001  | text-babbage-001 model that involves supervised fine-tuning on human-written demonstrations (docs).  | limited  | 
![]()  | text-ada-001 openai/text-ada-001  | text-ada-001 model that involves supervised fine-tuning on human-written demonstrations (docs).  | limited  | 
![]()  | gpt-4-0314 openai/gpt-4-0314  | GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from March 14th 2023.  | limited  | 
![]()  | gpt-4-32k-0314 openai/gpt-4-32k-0314  | GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from March 14th 2023.  | limited  | 
![]()  | code-davinci-002 openai/code-davinci-002  | Codex-style model that is designed for pure code-completion tasks (docs).  | limited  | 
![]()  | code-davinci-001 openai/code-davinci-001  | code-davinci-001 model  | limited  | 
![]()  | code-cushman-001 (12B) openai/code-cushman-001  | Codex-style model that is a stronger, multilingual version of the Codex (12B) model in the Codex paper.  | limited  | 
![]()  | gpt-3.5-turbo-0301 openai/gpt-3.5-turbo-0301  | Sibling model Sibling model of text-davinci-003 is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-03-01.  | limited  | 
![]()  | gpt-3.5-turbo-0613 openai/gpt-3.5-turbo-0613  | Sibling model Sibling model of text-davinci-003 is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13.  | limited  | 
![]()  | ChatGPT openai/chat-gpt  | Sibling model to InstructGPT which interacts in a conversational way. See OpenAI's announcement. The size of the model is unknown.  | limited  | 
![]()  | GPT-JT (6B) together/Together-gpt-JT-6B-v1  | GPT-JT (6B parameters) is a fork of GPT-J (blog post).  | open  | 
![]()  | GPT-NeoXT-Chat-Base (20B) together/gpt-neoxt-chat-base-20b  | GPT-NeoXT-Chat-Base (20B) is fine-tuned from GPT-NeoX, serving as a base model for developing open-source chatbots.  | open  | 
![]()  | RedPajama-INCITE-Base-v1 (3B) together/redpajama-incite-base-3b-v1  | RedPajama-INCITE-Base-v1 (3B parameters) is a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.  | open  | 
![]()  | RedPajama-INCITE-Instruct-v1 (3B) together/redpajama-incite-instruct-3b-v1  | RedPajama-INCITE-Instruct-v1 (3B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.  | open  | 
![]()  | RedPajama-INCITE-Chat-v1 (3B) together/redpajama-incite-chat-3b-v1  | RedPajama-INCITE-Chat-v1 (3B parameters) is a model fine-tuned on OASST1 and Dolly2 to enhance chatting ability. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.  | open  | 
![]()  | RedPajama-INCITE-Base-v1 (7B) together/redpajama-incite-base-7b-v1  | RedPajama-INCITE-Base-v1 (7B parameters) is a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.  | open  | 
| MosaicML | MPT (7B) mosaicml/mpt-7b  | MPT-7B is a Transformer trained from scratch on 1T tokens of text and code.  | open  | 
| MosaicML | MPT-Chat (7B) mosaicml/mpt-7b-chat  | MPT-Chat (7B) is a chatbot-like model for dialogue generation. It is built by finetuning MPT-7B, a Transformer trained from scratch on 1T tokens of text and code.  | open  | 
| MosaicML | MPT-Instruct (7B) mosaicml/mpt-7b-instruct  | MPT-Instruct (7B) is a model for short-form instruction following. It is built by finetuning MPT (7B), a Transformer trained from scratch on 1T tokens of text and code.  | open  | 
![]()  | CodeGen (16B) together/codegen  | CodeGen (16B parameters) is an open dense code model trained for multi-turn program synthesis (blog).  | open  | 
![]()  | GLM (130B) together/glm  | GLM (130B parameters) is an open bilingual (English & Chinese) bidirectional dense model that was trained using General Language Model (GLM) procedure (paper).  | open  | 
![]()  | CodeGeeX (13B) together/codegeex  | CodeGeeX (13B parameters) is an open dense code model trained on more than 20 programming languages on a corpus of more than 850B tokens (blog).  | open  | 
| Writer | Palmyra Base (5B) writer/palmyra-base  | Palmyra Base (5B)  | limited  | 
| Writer | Palmyra Large (20B) writer/palmyra-large  | Palmyra Large (20B)  | limited  | 
| Writer | InstructPalmyra (30B) writer/palmyra-instruct-30  | InstructPalmyra (30B)  | limited  | 
| Writer | Palmyra E (30B) writer/palmyra-e  | Palmyra E (30B)  | limited  | 
| Writer | Silk Road (35B) writer/silk-road  | Silk Road (35B)  | limited  | 
| Writer | Palmyra X (43B) writer/palmyra-x  | Palmyra X (43B)  | limited  | 
![]()  | YaLM (100B) together/yalm  | YaLM (100B parameters) is an autoregressive language model trained on English and Russian text ([GitHub](From https://github.com/yandex/YaLM-100B)).  | open  | 
![]()  | Megatron GPT2 nvidia/megatron-gpt2  | GPT-2 implemented in Megatron-LM (paper).  | open  | 












