Paper: Language Models are Unsupervised Multitask Learners
Link: https://bit.ly/3vgaVJc
Authors: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever Code: https://github.com/openai/gpt-2
I also made an annotated version of the paper which you can find here
What? The paper demonstrates that language models begin to learn NLP tasks like question answering, machine translation, reading comprehension and summarization without any explicit supervision. The results shown are obtained after training the model on a new dataset of millions of web pages called WebText.
Paper: XLNet: Generalized Autoregressive Pretraining for Language Understanding
Link: https://arxiv.org/pdf/1906.08237.pdf
Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
Code: https://github.com/zihangdai/xlnet
What? The paper proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts over all permutations of the factorization order and overcomes the limitations of BERT due to the autoregressive formulation of XLNet. XLNet incorporates Transformer-XL as the underlying model. It outperforms BERT in 20 NLP tasks like question answering, natural language inference, sentiment analysis and document ranking.
Paper: BERT - Pre-training of Deep Bidirectional Transformers for Language Understanding
Link: https://bit.ly/3bdTUra Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Code: https://bit.ly/3vRXlM7
What? The paper proposes BERT which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations from unlabeled text. It performs a joint conditioning on both left and right context in all the layers. The pre-trained BERT model can be fine-tuned with one additional layer to create the final task-specific models i.
Paper: Improving Language Understanding by Generative Pre-Training
Link: https://bit.ly/3xITvGP
Blog: https://openai.com/blog/language-unsupervised/ Authors: Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever
Code: https://bit.ly/3gUFrUX
What? The paper proposes a semi-supervised technique that shows better performance on a wide variety of tasks like textual entailment, question answering, semantic similarity text classification by using a single task-agnostic model. The model can overcome the constraints of the small amount of annotated data for these specific tasks by performing an unsupervised generative-pretraining of a language model on a large diverse text corpus followed by supervised discriminative fine-tuning on each specific task.
Paper: Deep contextualized word representations
Link: https://arxiv.org/abs/1802.05365 Authors: Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer
Code: https://bit.ly/3xpHNAI
Note - Since this is a relatively old paper, all the performance comparisons and state-of-the-art claims mentioned below should only be considered for the models at the time the paper was published.
What? The paper proposes a new type of deep contextualized word representation that helps to effectively capture the syntactic and semantic characteristics of the word along with the linguistic context of the word.
Our system for a Narural Language Generation based shared task organized at ACL 2018 (Association for Computational Linguistics, Melbourne, Australia).