Introduction In one of their smaller papers - “Superposition, Memorization, and Double Descent”, Anthropic investigated how model features were learnt over time under different dataset size regime...
High Dimension Computing
Introduction I recently came across an old lecture on High-Dimensional (HD) computing, in the forms of: This Quanta Magazine article This Stanford CS lecture by Pentti Kanerva And thought i...
Transformers are LSTMs v2
So I recently got into a bit of an argument with 2 friends while driving in the car to get dinner. It went a little bit like this: Me: I think LSTMs are definitely the precursor to transformers, m...
Automatic Reinforcement Unlearning
Introduction This past year has really got me deep-diving into mechanistic interpretability research. I think it makes so much sense as a computational science, is very fundamental and generalizab...
Common Spatial Pattern: Discriminator based on PCA
Principle Component Analysis (PCA) is a fundamental dimension-reduction technique that we all know to identify the top $k$ components of a set of $d-$ dimensional data points, where $k < d$. In ...
Human Writing is as Uniform as Machine Writing
Can we build a zero-shot Large Language Model (LLM) generated text detector without knowing which LLM potentially generated a given piece of text? A Stanford NLP research project (CS 224N) done i...
Multitask Learning Across Time-Series Spectrogram Tasks
Many time-series tasks share the ability to decompose (e.g. using the Fourier Transform) and express data using spectrograms. Could this, in and of itself, be a sufficient form of “shared structure...
The Gaussian's Binomial Origins (In Progress)
The Gaussian (Normal) Distribution can be found everywhere; it is also a laughably common tendency for statisticians and engineers to describe any stochastic phenomenon using Gaussians. Though we c...
Characteristics of OLS Predictor Coefficients (beta-hat)
Linear Regression (LR) is hands-down THE most useful and ubiquitous tool in statistics. Everything derives from linear regression; even the most complex statistical models at some point have to be ...
T-test: Motivation, Definition, and Derivation
Contents Contextual Problem. Defining $s^2$, a.k.a the sample variance ($\sigma^2$). Motivating the $T$ test. Deriving the $T$ statistic. 1. Contextual Problem A key objective of analysi...