Models

Creator	Model	Description	Access
	J1-Jumbo v1 (178B) ai21/j1-jumbo	Jurassic-1 Jumbo (178B parameters) (docs, tech report).	limited
	J1-Large v1 (7.5B) ai21/j1-large	Jurassic-1 Large (7.5B parameters) (docs, tech report).	limited
	J1-Grande v1 (17B) ai21/j1-grande	Jurassic-1 Grande (17B parameters) with a "few tweaks" to the training process (docs, tech report).	limited
	J1-Grande v2 beta (17B) ai21/j1-grande-v2-beta	Jurassic-1 Grande v2 beta (17B parameters)	limited
	Jurassic-2 Jumbo (178B) ai21/j2-jumbo	Jurassic-2 Jumbo (178B parameters) (docs)	limited
	Jurassic-2 Grande (17B) ai21/j2-grande	Jurassic-2 Grande (17B parameters) (docs)	limited
	Jurassic-2 Large (7.5B) ai21/j2-large	Jurassic-2 Large (7.5B parameters) (docs)	limited
Aleph Alpha	Luminous Base (13B) AlephAlpha/luminous-base	Luminous Base (13B parameters) (docs	limited
Aleph Alpha	Luminous Extended (30B) AlephAlpha/luminous-extended	Luminous Extended (30B parameters) (docs	limited
Aleph Alpha	Luminous Supreme (70B) AlephAlpha/luminous-supreme	Luminous Supreme (70B parameters) (docs	limited
	Anthropic-LM v4-s3 (52B) anthropic/stanford-online-all-v4-s3	A 52B parameter language model, trained using reinforcement learning from human feedback paper.	closed
	Anthropic Claude v1.3 anthropic/claude-v1.3	A model trained using reinforcement learning from human feedback (docs).	limited
	Anthropic Claude Instant V1 anthropic/claude-instant-v1	A lightweight version of Claude, a model trained using reinforcement learning from human feedback (docs).	limited
UC Berkeley	Koala (13B) together/koala-13b	Koala (13B) is a chatbot fine-tuned from Llama (13B) on dialogue data gathered from the web. (blog post)	open
	BLOOM (176B) together/bloom	BLOOM (176B parameters) is an autoregressive model trained on 46 natural languages and 13 programming languages (paper).	open
	BLOOMZ (176B) together/bloomz	BLOOMZ (176B parameters) is BLOOM that has been fine-tuned on natural language instructions (details).	open
	T0pp (11B) together/t0pp	T0pp (11B parameters) is an encoder-decoder model trained on a large set of different tasks specified in natural language prompts (paper).	open
BigCode	SantaCoder (1.1B) huggingface/santacoder	SantaCoder (1.1B parameters) model trained on the Python, Java, and JavaScript subset of The Stack (v1.1) (model card).	open
BigCode	StarCoder (15.5B) huggingface/starcoder	The StarCoder (15.5B parameter) model trained on 80+ programming languages from The Stack (v1.2) (model card).	open
Cerebras	Cerebras GPT (6.7B) together/cerebras-gpt-6.7b	Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. (paper)	limited
Cerebras	Cerebras GPT (13B) together/cerebras-gpt-13b	Cerebras GPT is a family of open compute-optimal language models scaled from 111M to 13B parameters trained on the Eleuther Pile. (paper)	limited
	Cohere xlarge v20220609 (52.4B) cohere/xlarge-20220609	Cohere xlarge v20220609 (52.4B parameters)	limited
	Cohere large v20220720 (13.1B) cohere/large-20220720	Cohere large v20220720 (13.1B parameters), which is deprecated by Cohere as of December 2, 2022.	limited
	Cohere medium v20220720 (6.1B) cohere/medium-20220720	Cohere medium v20220720 (6.1B parameters)	limited
	Cohere small v20220720 (410M) cohere/small-20220720	Cohere small v20220720 (410M parameters), which is deprecated by Cohere as of December 2, 2022.	limited
	Cohere xlarge v20221108 (52.4B) cohere/xlarge-20221108	Cohere xlarge v20221108 (52.4B parameters)	limited
	Cohere medium v20221108 (6.1B) cohere/medium-20221108	Cohere medium v20221108 (6.1B parameters)	limited
	Cohere Command beta (6.1B) cohere/command-medium-beta	Cohere Command beta (6.1B parameters) is fine-tuned from the medium model to respond well with instruction-like prompts (details).	limited
	Cohere Command beta (52.4B) cohere/command-xlarge-beta	Cohere Command beta (52.4B parameters) is fine-tuned from the XL model to respond well with instruction-like prompts (details).	limited
Databricks	Dolly V2 (3B) databricks/dolly-v2-3b	Dolly V2 (3B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.	open
Databricks	Dolly V2 (7B) databricks/dolly-v2-7b	Dolly V2 (7B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.	open
Databricks	Dolly V2 (12B) databricks/dolly-v2-12b	Dolly V2 (12B) is an instruction-following large language model trained on the Databricks machine learning platform. It is based on pythia-12b.	open
DeepMind	Gopher (280B) deepmind/gopher	Gopher (540B parameters) (paper).	closed
DeepMind	Chinchilla (70B) deepmind/chinchilla	Chinchilla (70B parameters) (paper).	closed
	GPT-J (6B) together/gpt-j-6b	GPT-J (6B parameters) autoregressive language model trained on The Pile (details).	open
	GPT-NeoX (20B) together/gpt-neox-20b	GPT-NeoX (20B parameters) autoregressive language model trained on The Pile (paper).	open
	Pythia (3B) together/pythia-3b	Pythia (3B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.	open
	Pythia (7B) together/pythia-7b	Pythia (7B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.	open
	Pythia (12B) together/pythia-12b	Pythia (12B parameters). The Pythia project combines interpretability analysis and scaling laws to understand how knowledge develops and evolves during training in autoregressive transformers.	open
	T5 (11B) together/t5-11b	T5 (11B parameters) is an encoder-decoder model trained on a multi-task mixture, where each task is converted into a text-to-text format (paper).	open
	UL2 (20B) together/ul2	UL2 (20B parameters) is an encoder-decoder model trained on the C4 corpus. It's similar to T5 but trained with a different objective and slightly different scaling knobs (paper).	open
	Flan-T5 (11B) together/flan-t5-xxl	Flan-T5 (11B parameters) is T5 fine-tuned on 1.8K tasks (paper).	open
	PaLM (540B) google/palm	Pathways Language Model (540B parameters) is trained using 6144 TPU v4 chips (paper).	closed
HazyResearch	H3 (2.7B) together/h3-2.7b	H3 (2.7B parameters) is a decoder-only language model based on state space models (paper).	open
	OPT-IML (175B) together/opt-iml-175b	OPT-IML (175B parameters) is a suite of decoder-only transformer LMs that are multi-task fine-tuned on 2000 datasets (paper).	open
	OPT-IML (30B) together/opt-iml-30b	OPT-IML (30B parameters) is a suite of decoder-only transformer LMs that are multi-task fine-tuned on 2000 datasets (paper).	open
	OPT (175B) together/opt-175b	Open Pre-trained Transformers (175B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).	open
	OPT (66B) together/opt-66b	Open Pre-trained Transformers (66B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).	open
	OPT (6.7B) together/opt-6.7b	Open Pre-trained Transformers (6.7B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).	open
	OPT (1.3B) together/opt-1.3b	Open Pre-trained Transformers (1.3B parameters) is a suite of decoder-only pre-trained transformers that are fully and responsibly shared with interested researchers (paper).	open
	Galactica (120B) together/galactica-120b	Galactica (120B parameters) is trained on 48 million papers, textbooks, lectures notes, compounds and proteins, scientific websites, etc. (paper).	open
	Galactica (30B) together/galactica-30b	Galactica (30B parameters) is trained on 48 million papers, textbooks, lectures notes, compounds and proteins, scientific websites, etc. (paper).	open
	LLaMA (7B) huggingface/llama-7b	LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.	open
Stanford	Alpaca (7B) huggingface/alpaca-7b	Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations.	open
	LLaMA (7B) together/llama-7b	LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.	open
	LLaMA (13B) together/llama-13b	LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.	open
	LLaMA (30B) together/llama-30b	LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.	open
	LLaMA (65B) together/llama-65b	LLaMA is a collection of foundation language models ranging from 7B to 65B parameters.	open
Stability AI	StableLM-Base-Alpha (7B) stabilityai/stablelm-base-alpha-7b	StableLM-Base-Alpha is a suite of 3B and 7B parameter decoder-only language models pre-trained on a diverse collection of English datasets with a sequence length of 4096 to push beyond the context window limitations of existing open-source language models.	open
Stanford	Alpaca (7B) together/alpaca-7b	Alpaca 7B is a model fine-tuned from the LLaMA 7B model on 52K instruction-following demonstrations	open
Stanford	Alpaca (13B) together/alpaca-13b	Alpaca 13B is a model fine-tuned from the LLaMA 13B model on 52K instruction-following demonstrations	open
Stanford	Alpaca (30B) together/alpaca-30b	Alpaca 30B is a model fine-tuned from the LLaMA 30B model on 52K instruction-following demonstrations	open
LMSYS	Vicuna (13B) together/vicuna-13b	Vicuna-13B is an open-source chatbot trained by fine-tuning LLaMA on user-shared conversations collected from ShareGPT.	open
	TNLG v2 (530B) microsoft/TNLGv2_530B	TNLG v2 (530B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper).	closed
	TNLG v2 (6.7B) microsoft/TNLGv2_7B	TNLG v2 (6.7B parameters) autoregressive language model trained on a filtered subset of the Pile and CommonCrawl (paper).	closed
	davinci (175B) openai/davinci	Original GPT-3 (175B parameters) autoregressive language model (paper, docs).	limited
	curie (6.7B) openai/curie	Original GPT-3 (6.7B parameters) autoregressive language model (paper, docs).	limited
	babbage (1.3B) openai/babbage	Original GPT-3 (1.3B parameters) autoregressive language model (paper, docs).	limited
	ada (350M) openai/ada	Original GPT-3 (350M parameters) autoregressive language model (paper, docs).	limited
	text-davinci-003 openai/text-davinci-003	text-davinci-003 model that involves reinforcement learning (PPO) with reward models. Derived from text-davinci-002 (docs).	limited
	text-davinci-002 openai/text-davinci-002	text-davinci-002 model that involves supervised fine-tuning on human-written demonstrations. Derived from code-davinci-002 (docs).	limited
	text-davinci-001 openai/text-davinci-001	text-davinci-001 model that involves supervised fine-tuning on human-written demonstrations (docs).	limited
	text-curie-001 openai/text-curie-001	text-curie-001 model that involves supervised fine-tuning on human-written demonstrations (docs).	limited
	text-babbage-001 openai/text-babbage-001	text-babbage-001 model that involves supervised fine-tuning on human-written demonstrations (docs).	limited
	text-ada-001 openai/text-ada-001	text-ada-001 model that involves supervised fine-tuning on human-written demonstrations (docs).	limited
	gpt-4-0314 openai/gpt-4-0314	GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 from March 14th 2023.	limited
	gpt-4-32k-0314 openai/gpt-4-32k-0314	GPT-4 is a large multimodal model (currently only accepting text inputs and emitting text outputs) that is optimized for chat but works well for traditional completions tasks. Snapshot of gpt-4 with a longer context length of 32,768 tokens from March 14th 2023.	limited
	code-davinci-002 openai/code-davinci-002	Codex-style model that is designed for pure code-completion tasks (docs).	limited
	code-davinci-001 openai/code-davinci-001	code-davinci-001 model	limited
	code-cushman-001 (12B) openai/code-cushman-001	Codex-style model that is a stronger, multilingual version of the Codex (12B) model in the Codex paper.	limited
	gpt-3.5-turbo-0301 openai/gpt-3.5-turbo-0301	Sibling model Sibling model of text-davinci-003 is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-03-01.	limited
	gpt-3.5-turbo-0613 openai/gpt-3.5-turbo-0613	Sibling model Sibling model of text-davinci-003 is optimized for chat but works well for traditional completions tasks as well. Snapshot from 2023-06-13.	limited
	ChatGPT openai/chat-gpt	Sibling model to InstructGPT which interacts in a conversational way. See OpenAI's announcement. The size of the model is unknown.	limited
	GPT-JT (6B) together/Together-gpt-JT-6B-v1	GPT-JT (6B parameters) is a fork of GPT-J (blog post).	open
	GPT-NeoXT-Chat-Base (20B) together/gpt-neoxt-chat-base-20b	GPT-NeoXT-Chat-Base (20B) is fine-tuned from GPT-NeoX, serving as a base model for developing open-source chatbots.	open
	RedPajama-INCITE-Base-v1 (3B) together/redpajama-incite-base-3b-v1	RedPajama-INCITE-Base-v1 (3B parameters) is a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.	open
	RedPajama-INCITE-Instruct-v1 (3B) together/redpajama-incite-instruct-3b-v1	RedPajama-INCITE-Instruct-v1 (3B parameters) is a model fine-tuned for few-shot applications on the data of GPT-JT. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.	open
	RedPajama-INCITE-Chat-v1 (3B) together/redpajama-incite-chat-3b-v1	RedPajama-INCITE-Chat-v1 (3B parameters) is a model fine-tuned on OASST1 and Dolly2 to enhance chatting ability. It is built from RedPajama-INCITE-Base-v1 (3B), a 3 billion base model that aims to replicate the LLaMA recipe as closely as possible.	open
	RedPajama-INCITE-Base-v1 (7B) together/redpajama-incite-base-7b-v1	RedPajama-INCITE-Base-v1 (7B parameters) is a 7 billion base model that aims to replicate the LLaMA recipe as closely as possible.	open
MosaicML	MPT (7B) mosaicml/mpt-7b	MPT-7B is a Transformer trained from scratch on 1T tokens of text and code.	open
MosaicML	MPT-Chat (7B) mosaicml/mpt-7b-chat	MPT-Chat (7B) is a chatbot-like model for dialogue generation. It is built by finetuning MPT-7B, a Transformer trained from scratch on 1T tokens of text and code.	open
MosaicML	MPT-Instruct (7B) mosaicml/mpt-7b-instruct	MPT-Instruct (7B) is a model for short-form instruction following. It is built by finetuning MPT (7B), a Transformer trained from scratch on 1T tokens of text and code.	open
	CodeGen (16B) together/codegen	CodeGen (16B parameters) is an open dense code model trained for multi-turn program synthesis (blog).	open
	GLM (130B) together/glm	GLM (130B parameters) is an open bilingual (English & Chinese) bidirectional dense model that was trained using General Language Model (GLM) procedure (paper).	open
	CodeGeeX (13B) together/codegeex	CodeGeeX (13B parameters) is an open dense code model trained on more than 20 programming languages on a corpus of more than 850B tokens (blog).	open
Writer	Palmyra Base (5B) writer/palmyra-base	Palmyra Base (5B)	limited
Writer	Palmyra Large (20B) writer/palmyra-large	Palmyra Large (20B)	limited
Writer	InstructPalmyra (30B) writer/palmyra-instruct-30	InstructPalmyra (30B)	limited
Writer	Palmyra E (30B) writer/palmyra-e	Palmyra E (30B)	limited
Writer	Silk Road (35B) writer/silk-road	Silk Road (35B)	limited
Writer	Palmyra X (43B) writer/palmyra-x	Palmyra X (43B)	limited
	YaLM (100B) together/yalm	YaLM (100B parameters) is an autoregressive language model trained on English and Russian text ([GitHub](From https://github.com/yandex/YaLM-100B)).	open
	Megatron GPT2 nvidia/megatron-gpt2	GPT-2 implemented in Megatron-LM (paper).	open