-
Investigating Vision Transformer-Based Models for Closure Modeling of Fluid Dynamical Systems
Project Report for 6.s898 Deep Learning (Fall 2023)
-
Are Watermarked Large Language Models More Prone to Hallucinations?
In this blog post, I investigate whether watermarked LLMs are more likely to “hallucinate,” or make up facts, because of limitations imposed by the watermarking scheme.
-
Predicting the Future: LSTM vs Transformers for Time Series Modeling
A comparison analysis between LSTM and Transformer models in the context of time-series forecasting. While LSTMs have long been a cornerstone, the advent of Transformers has sparked significant interest due to their attention mechanisms. In this study, we pinpoint which particular features of time series datasets could lead transformer-based models to outperform LSTM models.
-
Studying the benefits and limitations of sparse auto-encoders for compositional reasoning tasks
-
Solvent Encoding for solubility prediction using GNN
Evaluation of different solvent-encoding methods on a public available solubility dataset
-
6.s898 Final Project- Investigating the biological underpinnings of latent embeddings for scRNA-seq
-
Forbidden Facts
A Mechanistic Interpretability Investigation of Llama 2
-
Modeling Elephantfish Communication through Deep RNNs
Elephantfish represent a fascinating subject for study within the realms of bioacoustics and animal communication due to their unique use of electric fields for sensing and interaction. This project proposes the development of a deep learning framework to model the electrical communication signals of elephantfish, akin to language models used in natural language processing (NLP).
-
Exploring Image-Supervised Contrastive Diffusion - A Comparative Analysis with Applications in Image-to-Video Generation
Image-to-image (I2I) and image-to-video (I2V) may be the next frontier of generative deep learning capabilities, but current models struggle with robustness, largely due to the implicit, rather than explicit, representation learning objective during traditional diffusion model training. Hence, we propose a new technique where a custom contrastive loss function is used to leverage the innate latent space of the diffusion model’s variational autoencoder. This enables us to study the creation of lightweight models that lose less contextual information between input conditioning and target output, which we elucidate in this blog.
-
Combining Modalities for Better Molecular Representation Learning
-
Exploring Frobenius and Spectral Normalization in MLPs and Residual networks
This blog post compares the effects of a spectral view on weight normalization to a frobenius view on weight normalization normalization using a novel algorithm developed by us. We use two network types at multiple sizes to compare the effects of these two methods on the singular values of the weight matrices, the rank of the weight matrices, and the accuracy of the models.
-
Iterated Representation Learning
Representation learning is a subfield of deep learning focused on learning meaningful lower-dimensional embeddings of input data, and rapidly emerging to popularity for its efficacy with generative models. However, most representation learning techniques, such as autoencoders and variational autoencoders, learn only one embedding from the input data, which is then used to either reconstruct the original data or generate new samples. This project seeks to study the utility of a proposed iterated representation learning framework, which repeatedly trains new latent space embeddings based on the data outputted from the last round of representation. In particular, we seek to examine whether the performance of this iterated approach on a model and input dataset are indicative of any robustness qualities of the model and latent embedding space, and potentially derive a new framework for evaluating representation stability.
-
A Method for Alleviating Catastrophic Forgetting With Explainability
Using various explainability metrics to target, we freeze layers in CNNs to enable continual learning.
-
Graph Articulated Objects
Pre-trained large vision-language models (VLMs), such as GPT4-Vision, uniquely encode relationships and contextual information learned about the world through copious amounts of real-world text and image information. Within the context of robotics, the recent explosion of advancements in deep learning have enabled innovation on all fronts when solving the problem of generalized embodied intelligence. Teaching a robot to perform any real-world task requires it to perceive its environment accurately, plan the steps to execute the task at hand, and accurately control the robot to perform the given task. This project explores the use of vision-language models to generate domain descriptions. These can be used for task planning, closing the gap between raw images and semantic understanding of interactions possible within an environment.
-
Physics Loss
Learning a deep net to optimize an LP, based on predicting the optimal basis vector. Surveys existing approaches in the literature. Demonstrates high accuracy of feasibility and optimality on small problem instances, but documents issues when scaling to larger problems. Benchmarks against a modern optimization solver, with discussions on upfront training vs. variable inference computation times.
-
Diffusion Models on Low-Brightness Images
Diffusion models have been used with great success for a number of use cases, but they still remain largely unused on dim images. The primary related work has been on using a diffusion model for low-light image enhancement. However, most of these works agree that attempting to generate an image from noise generated on top of an already dim image often results in rgb shift and global degradation of the image. This is because a diffusion model adds noise to the given image and then attempts to denoise the image, so given a dim and low-contrast image, the model has a difficult time denoising. This blog post focuses on methods to improve diffusion model performance in low-light images
-
Semi-Supervised Domain Adaptation using Diffusion Models
6.S898 Project
-
The Effect of Activation Functions On Superposition in Toy Models
An in-depth exploration of how different activation functions influence superposition in neural networks.
-
Stable Diffusion for Oracle Bone Script
The project aims to train a ControlNet for Stable Diffusion on the condition of rendering traditional Chinese characters from oracle bone script samples.
-
Gradient-Boosted Neural Wavlet Interpolation for Time Series (G-BiTS)
Your blog post's abstract. This is an example of a distill-style blog post and the main elements it supports.
-
Challenges in Deep Learning Surrogates for Constrained Linear Optimization
Learning a deep net to optimize an LP, based on predicting the optimal basis vector. Surveys existing approaches in the literature. Demonstrates high accuracy of feasibility and optimality on small problem instances, but documents issues when scaling to larger problems. Benchmarks against a modern optimization solver, with discussions on upfront training vs. variable inference computation times.
-
Activation Patching in Vision Transformers
-
Transformer-Based Approaches for Hyperspectral Imagery in Remote Sensing
The introduction of transformer-based models in remote sensing signals a transformative shift in hyperspectral image (HSI) classification, providing advanced tools to navigate the complex data landscape. This investigation gauges the potential of vision transformers to accurately discern the detailed spectral and spatial correlations within HSI, accentuating their capacity to significantly improve detection and analysis in environmental monitoring and land management.
-
Learning Generals.io
We explore the application of deep learning to the online game generals.io and discuss what is necessary to achieve superhuman performance in generals.io.
-
A Comparative Study of transformer on long sequence time series data
This study evaluates Transformer models in traffic flow prediction. Focusing on long sequence time-series data, it evaluates the balance between computational efficiency and accuracy, suggesting potential combinations of methods for improved forecasting.
-
Transfer Resistant Model Training
This blog post details our work on training neural networks that are resistant to transfer learning techniques.
-
Sparse Autoencoders for a More Interpretable RLHF
Extending Anthropic's recent monosemanticity results toward a new, more interpretable way to fine-tune.
-
Using Synthetic Data to Minimize Real Data Requirements
Data acquisition for some tasks in synthetic biology can be cripplingly difficult to perform at a scale necessary for machine learning... so what if we just made our data up?*
-
Applications of Deep Learning in Timbre Transfer
Exploring musical timbre transfer by leveraging prior art in differential digital signal processing (DDSP) and modern deep learning structures.
-
The Effect of Activation Functions On Superposition in Toy Models
An in-depth exploration of how different activation functions influence superposition in neural networks.
-
Training Robust Networks
Exploring ResNet on TinyImageNet, unveiling brittleness and discovering simple robustment enhancement strategies via hyperparameter optimization
-
Imposing uniformity through Poisson flow models
Uniformity and alignment are used to explain the success of contrastive encoders. Can we use already trained, well-aligned features and impose uniformity to increase their quality and performance on downstream classification tasks?
-
6-DOF estimation through visual place recognition
A neural pose-estimation solution is implemented, which could help an agent with a downward-facing camera (such as a drone) to geolocate based on prior satellite imagery of terrain. The neural encoder infers extrinsic camera parameters from camera images, enabling estimation of 6 degrees of freedom (6-DOF), namely 3-space position and orientation. By encoding priors about satellite imagery in a neural network, the need for the agent to carry a satellite imagery dataset onboard is avoided.
-
Tracing the Seeds of Conflict: Advanced Semantic Parsing Techniques for Causality Detection in News Texts
This blog post outlines a research project aiming to uncover cause-effect-relationships in the sphere of (political) conflicts using a frame-semantic parser.
-
To Encode or Not To Encode: The Case for the Encoder-free Autodecoder Architecture
While the traditional autoencoder architecture consists of an encoder and a decoder to compress and reconstruct information with only the most prominent features, some recent work have begun to utilize an alternate framework, the autodecoder, in specific applications in the field of representation learning. Skipping the encoder network altogether and learning latent codes directly as parameters, we aim to compare the two architectures on practical reconstruction tasks as well as dive into the theory of autodecoders and why they work, along with certain novel features that they bring.
-
New Synthesis Approach for Personalized LLMS
-
Augmenting Expert Domain Image Inputs for Enhancing Visual Language Models Performance
This blog post explores enhancing visual language models, particularly for expert domains like scientific literature, where standard models struggle. By integrating domain-specific knowledge and advanced image embeddings, the research aims to refine the performance of visual language models such as OpenFlamingo. Leveraging graphical structured embeddings and graph neural networks, the study tests different methods of representing images to improve the models' interpretive capabilities.
-
Embeddings for Spatio-temporal Forecasting
An analysis of various embeddings methods for spatio-temporal forecasting.
-
In the pursuit of cheap and robust word embeddings
A study of how we can train a student word embedding model to mimic the teacher OpenAI word embedding model by using as small a training set as possible. We also investigate preprocessing tricks and robustness against poisoned data.
-
Leveraging Representation Engineering For LLM’s In-Context-Learning
We present a method to observe model internals whether LLMs are performing in-context learning and control the model outputs based on such Context Vectors.
-
Reasoning with Maps: Assessing Spatial Comprehension on Maps in Pre-trained Models
Map reasoning is an intuitive skill for humans and a fundamental skill with important applications in many domains. In this project, we aim to evaluate the capabilities of contemporary state-of-the-art Large Vision-Language Models (LVLMs) for reasoning on maps and comparing their capabilities with human participants on the coregistration task. We additionally propose and release a novel dataset to serve as an initial benchmark for map reasoning capabilities. We run an extensive analysis on the performance of open-source LVLMs showing that they struggle to achieve good performance on our dataset. Additionally, we show that coregistration is intuitive to human participants that were able to achieve close to perfect accuracy in a time-constrained manner.
-
Autoen-chorder: Predicting Musical Success With Neural Nets
In this blog, we discuss deep learning methods and results of predicting song popularity from audio features.
-
Ensemble Learning for Mitigating Double Descent
Exploring when and why Double Descent occurs, and how to mitigate it through Ensemble Learning.
-
Injecting Node Information via Embedding Initializations
Graph Neural Networks (GNNs) have revolutionized our approach to complex data structures, enabling a deeper understanding of relationships and patterns that traditional neural networks might miss. This project looks into the potential of embedding initializations in GNNs, particularly in the context of molecular function prediction and protein retrieval tasks. By investigating the effect of intentional, information-rich initializations versus random initializations, we aim to enhance the learning efficiency and accuracy of GNNs in these domains. Our study focuses on a precision medicine knowledge graph (PrimeKG) and employs TxGNN, a GNN model initially designed for disease-drug link prediction, repurposed for protein-molecular function link prediction. We explore the impact of using ESM embeddings for protein nodes, hypothesizing that these embeddings could provide structural information not explicitly present in the graph data. Through comparisons of the latent spaces and performances, we look to see the effectiveness of these embeddings in improving the model's predictive powe of protein function.
-
Overparameterization of Neural Networks through Kernel Regression and Gaussian Processes
In this work, we will explore the successes of overparameterization of neural networks through evaluating the relationship between the Neural Tangent Kernel (NTK), MLPs, and Gaussian processes.
-
Exploring Methods for Generating Music
Explores various machine learning techniques for generating music. Compares the performance of traditional RNNs, LSTMs, and transformers on generating sample sequences of music.
-
Can Constrastive Learning Recommend Me a Movie?
-
Improving CLIP Spatial Awareness Using Hard Negative Mining
CLIP struggles to understand and reason spatially. We attempt to solve this issue with introducing hard negative examples during training.
-
Multimodal Commonsense
6.S898 project for analyzing and evaluating the commonsense reasoning performance of multimodal vs text-only models.
-
Exploring Univariate Time Series Anomaly Detection using VAE's
In this blog post, we will take a deep dive into DONUT, a method that applies variational autoencoders to the problem of time series anomaly detection. We will begin with a overview of the original authors main ideas. Next, we will replicate some results, and perform new experiments to gain further insights into the properties, successes, and limitations of this method. Finally, we will run additional experiments that test extensions on the original formulation, and motivate future areas of exploration.
-
Graph Transformers
A study of Transformers' understanding of fundamental graph problems, where we propose a new, tailored architecture highlighting the model's potential in graph-related tasks.
-
Learning a Lifted Linearization for Switched Dynamical Systems
A final project proposal for 6.s898 in fall 2023
-
Sparse Autoencoder Universality - Under What Conditions are Learned Features Consistent?
This project aims to study the universality of features in LLMs by studying sparse autoencoders trained on similar layers of different models.
-
Optimizations of Transformers for Small-scale Performance
CNNs generally outperform ViTs in scenarios with limited training data. However, the narrative switches when the available training data is extensive. To bridge this gap and improve upon existing ViT methods, we explore how we can leverage recent progress in the transformer block and exploit the known structure of pre-trained ViTs.
-
Guided Transfer Learning and Learning How to Learn: When Is It Useful?
For downstream tasks that involve extreme few-shot learning, it's often not enough to predispose a model with only general knowledge using traditional pre-training. In this blog, we explore the nuances and potential applications of Guided Transfer Learning, a meta-learning approach that allows a model to learn inductive biases on top of general knowledge during pre-training.
-
Alive Scene
Inspired by the captivating Enchanted Portraits of the Harry Potter universe, my project unveils an innovative AI pipeline that transcends traditional scene-capture methods. Rather than merely recording scenes as a sequence of static images, this pipeline is intricately designed to interpret and articulate the dynamic behavior of various elements within a scene by utilizing CLIP semantic embeddings. This nuanced understanding enables the scenes to evolve autonomously and organically, mirroring the fluidity and spontaneity of living entities.
-
Projected fast feedforward networks
Abstract
-
Understanding Linear Mode Connectivity
We study the pruning behavior of vision transformers (ViTs), and possible relations to linear mode connectivity. Frankle et al. (2022) showed that linear mode connectivity, the tendency of a neural network to optimize to the same linearly connected minimum when trained SGD noise, is strongly tied to the existence of "lottery networks," sparse networks that can be trained to full accuracy. We found that when initialized from a pretrained network, the ViT model showed linear mode connectivity when fine tuning on CIFAR-10. Conversely, random initialization resulted in instability during training and a lack of linear mode connectivity. We also found that using the PLATON algorithm (Zhang et al.) to generate a mask was effective for pruning the network, suggesting the existence of lottery ticket networks in ViTs, but the connection between the existence of these trainable subnetworks and linear mode connectivity remains unclear.
-
Transformers vs. RNNs: How do findings from real-world datasets relate to the theory?
Transformers have rapidly surpassed RNNs in popularity due to their efficiency via parallel computing without sacrificing accuracy. Transformers are seemingly able to perform better than RNNs on memory based tasks without keeping track of that recurrence. This leads researchers to wonder -- why? To contriubte towards answering that question, I'll analyze the performance of transformer and RNN based models on datasets in real-world applications. Serving as a bridge between applications and theory-based work, this will hopefully enable future developers to better decide which architecture to use in practice.
-
Exploring the latent space of text-to-image diffusion models
In this blog post we explore how we can navigate through the latent space of stable diffusion and using interpolation techniques.
-
Accelerating large model inference with speculative decoding - 6.s898
An investigation into methods to speed up autoregressive inference through increased parallelization, specifically through speculative sampling and decoding.
-
Unraveling Social Reasoning in LLMs: A Deep Dive into the Social IQA Benchmark
In this study, we investigate the challenge of social commonsense reasoning in large language models (LLMs), aiming to understand and categorize common errors LLMs make in social commonsense reasoning tasks.
-
Comparing data augmentation using VAEs and denoising-VAEs for limited noisy datasets
-
Emoji3Vec
Our project seeks to expand on the previous attempts at "emoji2vec", or generating semantically meaningful embeddings for emojis.
-
Modeling Human Speech Recognition with Different Network Architectures
Evaluating a neural network's ability to effectively model human speech recognition using CNNs vs. TNNs
-
Analytic, Empirical, and Monte Carlo Bayesian Methods for Uncertainty Estimation
In the realm of machine learning, the robustness and reliability of predictive models are important, especially when confronted with Out-of-Distribution (OOD) data that deviate from the training distribution. Bayesian models stand out for their probabilistic foundations, being able to offer ways to quantify uncertainty. This project will present a survey of already-established methods of estimating uncertainty, as well as how we adapted/generalized them.
-
Understanding LLM Attention on Useless Numbers in Word Problems (and this Title has 8 Es)
If Jack starts out with 4 llamas and Jill takes 2 of them, then Jack gets 5 chinchillas, how many llamas does he have?
-
Cross-Lingual Fine-Tuning for Multilingual Text Embeddings
Exploring contrastively training text embeddings, and presenting a scalable, cheap and data-efficient method to train multilingual embedding models
-
Learning Interpretable Features with Sparse Auto-Encoders
-
How does model size impact catastrophic forgetting in online continual learning?
Yes, model size matters.
-
VGAE Clustering of the Fruit Fly Connectome
An exploration of how learned Variational Graph Auto-Encoder (VGAE) embeddings compare to Spectral Embeddings to determine the function of neurons in the fruit fly brain.
-
Robust Image to Video Generation Using Contrastive Diffusion Over Latents
Image-to-video (I2V) may be the next frontier of generative deep learning capabilities, but current models struggle with robustness, largely due to the implicit, rather than explicit, representation learning objective during traditional diffusion model training. Hence, we propose a new technique where a pre-trained contrastive model is used to train a diffusion model with a custom contrastive loss function to operate within a learned structured latent space for I2V problems, yielding, in theory, more structurally sound videos without loss of contextual information.
-
Adaptive Controller with Neural Net Equations of Motion for High-DOF Robots
This project aims to develop an adaptive control mechanism using a graph neural network to approximate the equations of motion (EoM) for high-degree-of-freedom (DOF) robotic arms bypassing the need for symbolic EoM to build an adaptive controller.
-
Robustness of self-supervised ViT features in b-mode images
Vision Transformers (ViT) trained with self-distillation with no labels (DINO) have shown striking properties for several downstream tasks regarding segmentation, classification, and image correspondence. In this work, we assess DINO-vit-s/8 on a new dataset containing b-mode ultrasound images with the ultimate goal of segmenting bone.
-
Investigating the Impact of Symmetric Optimization Algorithms on Learnability
Recent theoretical papers in machine learning have raised concerns about the impact of symmetric optimization algorithms on learnability, citing hardness results from theoretical computer science. This project aims to empirically investigate and validate these theoretical claims by designing and conducting experiments as understanding the role of optimization algorithms in the learning process is crucial for advancing the field of machine learning.
-
Can CNN learn shapes?
One widely accepted intuition is that Convolutional Neural Networks that are trained for object classification, combine low-level features (e.g. edges) to gradually learn more complex and abstracted patterns that are useful in differentiating images. Yet it remains poorly understood how CNNs actually make their decisions, and how their recognition strategies differ from humans. Specifically, there is a major debate about the question of whether CNNs primarily rely on surface regularities of objects, or whether they are capable of exploiting the spatial arrangement of features, similar to humans.
-
Quantum Circuit Optimization with Graph Neural Nets
We perform a systematic study of architectural choices of graph neural net-based reinforcement learning agents for quantum circuit optimization.
-
Structural vs Data Inductive Bias
Class project proposal
-
From Scroll to Misbelief - Modeling the Unobservable Susceptibility to Misinformation on Social Media
-
Examining assumptions in scRNA-seq foundation model pre-training (6.S898 Final Project)
Final project for MIT's Deep Learning (6.S898) class.
-
Increasing Context Length For Transformers
How can we make attention more efficient?
-
Zero-Shot Machine-Generated Image Detection using Sinks of Gradient Flows
How can we detect fake images online? A novel approach of characterizing the behavior of a diffusion model's learned score vectors.
-
Denoising EMG signals
The future of brain-computer interfaces rests on our ability to decode neural signals. Here we attempt to ensemble ML techniques to extract useful information from sEMG signals to improve downstream task performance.
-
A Deeper Look into Equivariance for Materials Data
A Comparative Analysis of an E(3) Equivariant GNN and a Non-Equivariant GNN in Materials Data Tasks with a Focus on Investigating the Interpretability of Latent Geometry within the Two GNNs.
-
Prompt to Prompt
Text-based image editing via cross-attention mechanisms - the research of hyperparameters and novel mechanisms to enhance existing frameworks
-
Understanding Bias in Speech to Text Language Models
Do language models have biases that make them better for latin based languages like English? To find out, we generate a custom dataset to test how various language features, like silent letters, letter combinations, and letters out of order, affect how speech2text models learn and compare these results with models trained on real human language.
-
Regularization Techniques for Attention Layers in Transformer Models
Attention layers are an integral part of the success of transformer models, but can also lead to overfitting on parts of input data when there is limited training data. Therefore, researchers have proposed methods to regularize attention layers to reduce overfitting and increase generalizability. This blog will analyze popular methods and explore novel approaches to regularization in attention layers.
-
Neural PDEs for learning local dynamics and longer temporal rollouts
6.S898 deep learning project
-
Graph neural networks v.s. transformers for geometric graphs
With the recent development of graph transformers, in this project we aim to compare their performance on a molecular task of protein-ligand binding affinity prediction against the performance of message passing graph neural networks.
-
An empirical evaluation of autoencoders and diffusion models for 2D small-molecule generation
We examine the efficacy of autoencoders and diffusion models for generating 2D molecules with certain small-molecule properties. In particular, we evaluate the success of both models in creating new molecules, containing only CHONPS atoms, and only single, double, and aromatic bonds. Secondarily, a natural question that followed was investigating the efficacy of different manners of encoding molecular data for training models - specifically, we trained with both molecular fingerprints and adjacency matrices (derived from graph embeddings of molecules). We find that small autoencoder models are successful in generating both pseudo-fingerprints and pseudo-adjacency matrices that are similar to simple small molecules’ fingerprints and adjacency matrices, but they were not able to produce ‘convincing’ simple organic molecules from the fingerprint or adjacency matrices. We find that diffusion models were considerably faster and more lightweight than autoencoders, and were generated molecules that were quantitatively closer in structure to real chemical structures than the auto-encoders were able to produce.
-
VIVformer
A deep transformer framework trained on real experimental and synthetic gen-AI data for forecasting non-stationary time-series. Applications and insights drawn from vortex induced vibrations data collected at the MIT Towing Tank.
-
Recovering Latent Variables with VAEs despite Training Bias
Final Project Blog
-
Recurrent Recommender System with Incentivized Search
This project considers the use of Recurrent Neural Networks (RNNs) in session-based recommender systems. We input sequences of customers' behavior, such as browsing history, to predict which product they're most likely to buy next. Our model improves upon this by taking into account how previous recommendations influence subsequent search behavior, which then serves as our training data. Our approach introduces a multi-task RNN that not only aims to recommend products with the highest likelihood of purchase but also those that are likely to encourage further customer searches. This additional search activity can enrich our training data, ultimately boosting the model's long-term performance.
-
Understanding Limitations of Vision-Language Models
-
Contrastive Representation Learning for Dynamical Systems
A deep learning method of learning system underlying parameters from observed trajectories