Coming Soon My previous blogpost gives a very clear visualization of how the latents of simple ReLU networks look like, how to interpret them, and a good description of optimization pressures that...
Optimization Failure
In my previous post, “Superposition - An Actual Image of Latent Spaces”, I illustrate how the parameters of a toy ReLU auto-encoder ($W$, $b$, and $\text{ReLU}$), work together to allow models repr...
Superposition - An Actual View of Latent Spaces
This post is the prequel of the next post, “Optimization Failure”, where I investigate how, even in cases where perfectly symmetric, ideal weight configurations exist, ReLU toy models (following An...
High Dimension Computing
Introduction I recently came across an old lecture on High-Dimensional (HD) computing, in the forms of: This Quanta Magazine article This Stanford CS lecture by Pentti Kanerva And thought i...
Transformers are LSTMs v2
So I recently got into a bit of an argument with 2 friends while driving in the car to get dinner. It went a little bit like this: Me: I think LSTMs are definitely the precursor to transformers, m...
Automatic Reinforcement Unlearning
Introduction This past year has really got me deep-diving into mechanistic interpretability research. I think it makes so much sense as a computational science, is very fundamental and generalizab...
Common Spatial Pattern: Discriminator based on PCA
Principle Component Analysis (PCA) is a fundamental dimension-reduction technique that we all know to identify the top $k$ components of a set of $d-$ dimensional data points, where $k < d$. In ...
Human Writing is as Uniform as Machine Writing
Can we build a zero-shot Large Language Model (LLM) generated text detector without knowing which LLM potentially generated a given piece of text? A Stanford NLP research project (CS 224N) done i...
Multitask Learning Across Time-Series Spectrogram Tasks
Many time-series tasks share the ability to decompose (e.g. using the Fourier Transform) and express data using spectrograms. Could this, in and of itself, be a sufficient form of “shared structure...
The Gaussian's Binomial Origins (In Progress)
The Gaussian (Normal) Distribution can be found everywhere; it is also a laughably common tendency for statisticians and engineers to describe any stochastic phenomenon using Gaussians. Though we c...