Machine Learning

Academic Log | June/July 2022

A collection of academic papers/blogs/talks/projects that I read/watched/explored during the month. I also include any small (or large) personal projects that I did and any such related ML/non-ML work. Personal Projects Paper re-implementation - “Extracting Training Data from Large Language Models” by Carlini et al., 2021. - [Github] Annotated Papers Learning Backward Compatible Embeddings Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models Tracing Knowledge in Language Models Back to the Training Data Papers I read On the Unreasonable Effectiveness of Feature propagation in Learning on Graphs with Missing Node Features PaLM: Scaling Language Modeling with Pathways Hierarchical Text-Conditional Image Generation with CLIP Latents Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Unified Contrastive Learning in Image-Text-Label Space Improving Passage Retrieval with Zero-Shot Question Generation Exploring Dual Encoder Architectures for Question Answering Efficient Fine-Tuning of BERT Models on the Edge Fine-Tuning Transformers: Vocabulary Transfer Manipulating SGD with Data Ordering Attacks Differentially Private Fine-tuning of Language Models Extracting Training Data from Large Language Models Learning Backward Compatible Embeddings Compacter: Efficient Low-Rank Hypercomplex Adapter Layers Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models Tracing Knowledge in Language Models Back to the Training Data Blogs I read Domain Adaptation with Generative Pseudo-Labeling (GPL) Making Deep Learning Go Brrrr From First Principles Introduction to TorchScript Nonlinear Computation in Deep Linear Networks Talks I watched How GPU Computing Works   Subscribe to my posts!

ConvNeXt - Adversarial images generation

I implemented [Stanislav Fort's project](https://twitter.com/stanislavfort/status/1481263565998805002?s=20) in Pytorch. The Github repo has a notebook which looks at generating adversarial images to 'fool' the ConvNeXt model's image classification capabilities. ConvNeXt came out earlier this year (2022) from Meta AI. The FGSM (Fast Gradient Sign Method) is a great algorithm to attack models in a white-box fashion with the goal of misclassification. Noise is added to the input image (not randomly) but in a manner such that the direction is the same as the gradient of the cost function with respect to the data.

Deploying Machine Learning models using GCP's Google AI Platform - A Detailed Tutorial

In my last post I had written about deploying models on AWS. So, I though it would only be fitting to write one for GCP, for all the GCP lovers out there. GCP has a service called the AI Platform which, as the name suggest, is responsible for training and hosting ML/AI models. Just like the last post, this post, through a PoC, describes - How to add a trained model to a Google Cloud bucket Host the saved model on the AI Platform Create a Service Account to use the model hosted on AI Platform externally Make a Streamlit app to make a UI to access the hosted model All the code can be found in my Github repository.

Deploying Machine Learning models using AWS Lambda and Github Actions - A Detailed Tutorial

Quite a while back, I had written a post in which I described how to package your Machine Learning models using Docker and deploy them using Flask. This post, through a PoC, describes - How to package your model using Docker (similar as last post) How to push the Docker container to Amazon ECR Add a Lambda Function for your model Make a REST API using Amazon API Gateway to access your model Automate the whole process using Github Actions, so that any updates to the model can take effect immediately Make a Streamlit app to make a UI to access the REST API (for the model deployed on AWS) All the code can be found in my Github repository.

PPML Series #3 - Federated Learning for Mobile Keyboard Prediction

Introduction Gboard — the Google keyboard, is a virtual keyboard for smartphones with support for more than 900+ language varieties and over 1 billion installs. In addition to decoding noisy signals from input modalities including tap and word-gesture typing, Gboard provides auto-correction, word completion, and next-word prediction features. Next-word predictions provide a tool for facilitating text entry and is plays an important role in improving user experience. Based on a small amount of user-generated preceding text, language models (LMs) can predict the most probable next word or phrase.

PPML Series #2 - Federated Optimization Algorithms - FedSGD and FedAvg

In my last post, I covered a high-level overview of Federated Learning, its applications, advantages & challenges. We also went through a high-level overview of how Federated Optimization algorithms work. But from a mathematical sense, how is Federated Learning training actually performed? That’s what we will be looking at in this post. There was a paper, Communication-Efficient Learning of Deep Networks from Decentralized Data by Google (3637 citations!!!), in which the authors had proposed a federated optimization algorithm called FedAvg and compared it with a naive baseline, FedSGD.

PPML Series #1 - An introduction to Federated Learning

Motivation Privacy-preserving Machine Learning had always been exciting for me. Since my B.Tech. thesis involving PPML (SMPC + Computer Vision), I didn’t get a chance to work on it after that. So, after about 2 years, I have started to read about it again, and sharing it with the community. Federated Learning is a domain that I had somewhat eluded during my thesis. I had some idea about the topic but didn’t get into it much.

ML Optimizers in JAX

Implementations of some popular optimizers from scratch for a simple model i.e., Linear Regression on a dataset of 5 features. The goal of this project was to understand how these optimizers work under the hood and try to do a toy implementation myself. I also use a bit of JAX magic to perform the differentiation of the loss function w.r.t to the weights and the bias without explicitly writing their derivatives as a separate function. This can help to generalize this notebook for other types of loss functions as well.

Paper Summary #6 - Language Models are Unsupervised Multitask Learners

Paper: Language Models are Unsupervised Multitask Learners Link: https://bit.ly/3vgaVJc Authors: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever Code: https://github.com/openai/gpt-2 I also made an annotated version of the paper which you can find here What? The paper demonstrates that language models begin to learn NLP tasks like question answering, machine translation, reading comprehension and summarization without any explicit supervision. The results shown are obtained after training the model on a new dataset of millions of web pages called WebText.

Paper Summary #5 - XLNet: Generalized Autoregressive Pretraining for Language Understanding

Paper: XLNet: Generalized Autoregressive Pretraining for Language Understanding Link: https://arxiv.org/pdf/1906.08237.pdf Authors: Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le Code: https://github.com/zihangdai/xlnet What? The paper proposes XLNet, a generalized autoregressive pretraining method that enables learning bidirectional contexts over all permutations of the factorization order and overcomes the limitations of BERT due to the autoregressive formulation of XLNet. XLNet incorporates Transformer-XL as the underlying model. It outperforms BERT in 20 NLP tasks like question answering, natural language inference, sentiment analysis and document ranking.