LeVanLoi miscellaneous articles

  1. Trang chủ
  2. Lưu
  3. Thẻ
  4. Hỏi - Đáp

 
 
LeVanLoi'log, ⌚ 2023-05-05
***
Model Collection
Tác giả: DAIR.AI
https://www.promptingguide.ai/models/collection

This section consists of a collection and summary of notable and foundational LLMs. (Data adopted from Papers with Code and the recent work by Zhao et al. (2023)).

 

Model

Release Date

Description

BERT

2018

Bidirectional Encoder Representations from Transformers

GPT

2018

Improving Language Understanding by Generative Pre-Training

RoBERTa

2019

A Robustly Optimized BERT Pretraining Approach

GPT-2

2019

Language Models are Unsupervised Multitask Learners

T5

2019

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

BART

2019

Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

ALBERT

2019

A Lite BERT for Self-supervised Learning of Language Representations

XLNet

2019

Generalized Autoregressive Pretraining for Language Understanding and Generation

CTRL

2019

CTRL: A Conditional Transformer Language Model for Controllable Generation

ERNIE

2019

ERNIE: Enhanced Representation through Knowledge Integration

GShard

2020

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

GPT-3

2020

Language Models are Few-Shot Learners

LaMDA

2021

LaMDA: Language Models for Dialog Applications

PanGu-α

2021

PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation

mT5

2021

mT5: A massively multilingual pre-trained text-to-text transformer

CPM-2

2021

CPM-2: Large-scale Cost-effective Pre-trained Language Models

T0

2021

Multitask Prompted Training Enables Zero-Shot Task Generalization

HyperCLOVA

2021

What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers

Codex

2021

Evaluating Large Language Models Trained on Code

ERNIE 3.0

2021

ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

Jurassic-1

2021

Jurassic-1: Technical Details and Evaluation

FLAN

2021

Finetuned Language Models Are Zero-Shot Learners

MT-NLG

2021

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Yuan 1.0

2021

Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning

WebGPT

2021

WebGPT: Browser-assisted question-answering with human feedback

Gopher

2021

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

ERNIE 3.0 Titan

2021

ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation

GLaM

2021

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

InstructGPT

2022

Training language models to follow instructions with human feedback

GPT-NeoX-20B

2022

GPT-NeoX-20B: An Open-Source Autoregressive Language Model

AlphaCode

2022

Competition-Level Code Generation with AlphaCode

CodeGen

2022

CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis

Chinchilla

2022

Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.

Tk-Instruct

2022

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

UL2

2022

UL2: Unifying Language Learning Paradigms

PaLM

2022

PaLM: Scaling Language Modeling with Pathways

OPT

2022

OPT: Open Pre-trained Transformer Language Models

BLOOM

2022

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

GLM-130B

2022

GLM-130B: An Open Bilingual Pre-trained Model

AlexaTM

2022

AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

Flan-T5

2022

Scaling Instruction-Finetuned Language Models

Sparrow

2022

Improving alignment of dialogue agents via targeted human judgements

U-PaLM

2022

Transcending Scaling Laws with 0.1% Extra Compute

mT0

2022

Crosslingual Generalization through Multitask Finetuning

Galactica

2022

Galactica: A Large Language Model for Science

OPT-IML

2022

OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization

LLaMA

2023

LLaMA: Open and Efficient Foundation Language Models

GPT-4

2023

GPT-4 Technical Report

PanGu-Σ

2023

PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

BloombergGPT

2023

BloombergGPT: A Large Language Model for Finance

Cerebras-GPT

2023

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster