nlp | Shreyansh Singh

Academic Log | October-December 2022

A collection of academic papers/blogs/talks/projects that I read/watched/explored during the month. I also include any small (or large) personal projects that I did and any such related ML/non-ML work. Personal Projects Paper re-implementation - Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability by Cohen et al., 2021 - [Github] Paper re-implementation - The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks by Frankle et al., 2018 - [Github] Paper re-implementation -An Empirical Model of Large-Batch Training by OpenAI, 2018 - [Github] Annotated Papers The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability Modeling Language Usage and Listener Engagement in Podcasts Which Algorithmic Choices Matter at Which Batch Sizes?

Academic Log | August/September 2022

A collection of academic papers/blogs/talks/projects that I read/watched/explored during the month. I also include any small (or large) personal projects that I did and any such related ML/non-ML work. Personal Projects VAE-Implementation - A simple implementation of Autoencoder and Variational Autoencoder - [Github] MinHash-Implemenation - A simple MinHash implementation based on the explanation in the Mining of Massive Datasets course by Stanford - [Github] Paper re-implementation - Sentence VAE paper, “Generating Sentences from a Continuous Space” by Bowman et al.

Academic Log | June/July 2022

A collection of academic papers/blogs/talks/projects that I read/watched/explored during the month. I also include any small (or large) personal projects that I did and any such related ML/non-ML work. Personal Projects Paper re-implementation - “Extracting Training Data from Large Language Models” by Carlini et al., 2021. - [Github] Annotated Papers Learning Backward Compatible Embeddings Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models Tracing Knowledge in Language Models Back to the Training Data Papers I read On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features PaLM: Scaling Language Modeling with Pathways Hierarchical Text-Conditional Image Generation with CLIP Latents Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Unified Contrastive Learning in Image-Text-Label Space Improving Passage Retrieval with Zero-Shot Question Generation Exploring Dual Encoder Architectures for Question Answering Efficient Fine-Tuning of BERT Models on the Edge Fine-Tuning Transformers: Vocabulary Transfer Manipulating SGD with Data Ordering Attacks Differentially Private Fine-tuning of Language Models Extracting Training Data from Large Language Models Learning Backward Compatible Embeddings Compacter: Efficient Low-Rank Hypercomplex Adapter Layers Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models Tracing Knowledge in Language Models Back to the Training Data Blogs I read Domain Adaptation with Generative Pseudo-Labeling (GPL) Making Deep Learning Go Brrrr From First Principles Introduction to TorchScript Nonlinear Computation in Deep Linear Networks Talks I watched How GPU Computing Works Subscribe to my posts!

Paper Summary #6 - Language Models are Unsupervised Multitask Learners

Paper: Language Models are Unsupervised Multitask Learners Link: https://bit.ly/3vgaVJc Authors: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever Code: https://github.com/openai/gpt-2 I also made an annotated version of the paper which you can find here What? The paper demonstrates that language models begin to learn NLP tasks like question answering, machine translation, reading comprehension and summarization without any explicit supervision. The results shown are obtained after training the model on a new dataset of millions of web pages called WebText.

Paper Summary #5 - XLNet: Generalized Autoregressive Pretraining for Language Understanding

Paper: XLNet: Generalized Autoregressive Pretraining for Language Understanding Link: https://arxiv.org/pdf/1906.08237.pdf Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le Code: https://github.com/zihangdai/xlnet What? The paper proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts over all permutations of the factorization order and overcomes the limitations of BERT due to the autoregressive formulation of XLNet. XLNet incorporates Transformer-XL as the underlying model. It outperforms BERT in 20 NLP tasks like question answering, natural language inference, sentiment analysis and document ranking.

Paper Summary #4 - BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper: BERT - Pre-training of Deep Bidirectional Transformers for Language Understanding Link: https://bit.ly/3bdTUra Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova Code: https://bit.ly/3vRXlM7 What? The paper proposes BERT which stands for Bidirectional Encoder Representations from Transformers. BERT is designed to pre-train deep bidirectional representations from unlabeled text. It performs a joint conditioning on both left and right context in all the layers. The pre-trained BERT model can be fine-tuned with one additional layer to create the final task-specific models i.

Paper Summary #3 - Improving Language Understanding by Generative Pre-Training

Paper: Improving Language Understanding by Generative Pre-Training Link: https://bit.ly/3xITvGP Blog: https://openai.com/blog/language-unsupervised/ Authors: Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever Code: https://bit.ly/3gUFrUX What? The paper proposes a semi-supervised technique that shows better performance on a wide variety of tasks like textual entailment, question answering, semantic similarity text classification by using a single task-agnostic model. The model can overcome the constraints of the small amount of annotated data for these specific tasks by performing an unsupervised generative-pretraining of a language model on a large diverse text corpus followed by supervised discriminative fine-tuning on each specific task.

Paper Summary #2 - Deep contextualized word representations

Paper: Deep contextualized word representations Link: https://arxiv.org/abs/1802.05365 Authors: Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer Code: https://bit.ly/3xpHNAI Note - Since this is a relatively old paper, all the performance comparisons and state-of-the-art claims mentioned below should only be considered for the models at the time the paper was published. What? The paper proposes a new type of deep contextualized word representation that helps to effectively capture the syntactic and semantic characteristics of the word along with the linguistic context of the word.

Paper Summary #1 - Attention Is All You Need

Paper: Attention Is All You Need Link: https://bit.ly/3aklLFY Authors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin Code: https://github.com/tensorflow/tensor2tensor What? Proposes Transformers, a new simple architecture for sequence transduction that uses only an attention mechanism and does not use any kind of recurrence or convolution. This model achieves SOTA (at the time) on the WMT 2014 English-to-French translation task with a score of 41.

Multilingual Surface Realization for NLG

Our system for a Narural Language Generation based shared task organized at ACL 2018 (Association for Computational Linguistics, Melbourne, Australia).