BERT
|
2018
|
Bidirectional Encoder Representations from Transformers
|
GPT
|
2018
|
Improving Language Understanding by Generative Pre-Training
|
RoBERTa
|
2019
|
A Robustly Optimized BERT Pretraining Approach
|
GPT-2
|
2019
|
Language Models are Unsupervised Multitask Learners
|
T5
|
2019
|
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
|
BART
|
2019
|
Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
|
ALBERT
|
2019
|
A Lite BERT for Self-supervised Learning of Language Representations
|
XLNet
|
2019
|
Generalized Autoregressive Pretraining for Language Understanding and Generation
|
CTRL
|
2019
|
CTRL: A Conditional Transformer Language Model for Controllable Generation
|
ERNIE
|
2019
|
ERNIE: Enhanced Representation through Knowledge Integration
|
GShard
|
2020
|
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
|
GPT-3
|
2020
|
Language Models are Few-Shot Learners
|
LaMDA
|
2021
|
LaMDA: Language Models for Dialog Applications
|
PanGu-α
|
2021
|
PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
|
mT5
|
2021
|
mT5: A massively multilingual pre-trained text-to-text transformer
|
CPM-2
|
2021
|
CPM-2: Large-scale Cost-effective Pre-trained Language Models
|
T0
|
2021
|
Multitask Prompted Training Enables Zero-Shot Task Generalization
|
HyperCLOVA
|
2021
|
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
|
Codex
|
2021
|
Evaluating Large Language Models Trained on Code
|
ERNIE 3.0
|
2021
|
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
|
Jurassic-1
|
2021
|
Jurassic-1: Technical Details and Evaluation
|
FLAN
|
2021
|
Finetuned Language Models Are Zero-Shot Learners
|
MT-NLG
|
2021
|
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
|
Yuan 1.0
|
2021
|
Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning
|
WebGPT
|
2021
|
WebGPT: Browser-assisted question-answering with human feedback
|
Gopher
|
2021
|
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
|
ERNIE 3.0 Titan
|
2021
|
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
|
GLaM
|
2021
|
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts
|
InstructGPT
|
2022
|
Training language models to follow instructions with human feedback
|
GPT-NeoX-20B
|
2022
|
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
|
AlphaCode
|
2022
|
Competition-Level Code Generation with AlphaCode
|
CodeGen
|
2022
|
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
|
Chinchilla
|
2022
|
Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data.
|
Tk-Instruct
|
2022
|
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
|
UL2
|
2022
|
UL2: Unifying Language Learning Paradigms
|
PaLM
|
2022
|
PaLM: Scaling Language Modeling with Pathways
|
OPT
|
2022
|
OPT: Open Pre-trained Transformer Language Models
|
BLOOM
|
2022
|
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
|
GLM-130B
|
2022
|
GLM-130B: An Open Bilingual Pre-trained Model
|
AlexaTM
|
2022
|
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
|
Flan-T5
|
2022
|
Scaling Instruction-Finetuned Language Models
|
Sparrow
|
2022
|
Improving alignment of dialogue agents via targeted human judgements
|
U-PaLM
|
2022
|
Transcending Scaling Laws with 0.1% Extra Compute
|
mT0
|
2022
|
Crosslingual Generalization through Multitask Finetuning
|
Galactica
|
2022
|
Galactica: A Large Language Model for Science
|
OPT-IML
|
2022
|
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
|
LLaMA
|
2023
|
LLaMA: Open and Efficient Foundation Language Models
|
GPT-4
|
2023
|
GPT-4 Technical Report
|
PanGu-Σ
|
2023
|
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
|
BloombergGPT
|
2023
|
BloombergGPT: A Large Language Model for Finance
|
Cerebras-GPT
|
2023
|
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
|