language understanding | Shreyansh Singh

Paper Summary #4 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper: BERT - Pre-training of Deep Bidirectional Transformers for Language Understanding Link: https://bit.ly/3bdTUra Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova Code: https://bit.ly/3vRXlM7 What? The paper proposes BERT which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations from unlabeled text. It performs a joint conditioning on both left and right context in all the layers. The pre-trained BERT model can be fine-tuned with one additional layer to create the final task-specific models i.