Home
Amagibaba
Cancel

Optimization Failure

Introduction In one of their smaller papers - “Superposition, Memorization, and Double Descent”, Anthropic investigated how model features were learnt over time under different dataset size regime...

High Dimension Computing

Introduction I recently came across an old lecture on High-Dimensional (HD) computing, in the forms of: This Quanta Magazine article This Stanford CS lecture by Pentti Kanerva And thought i...

Transformers are LSTMs v2

So I recently got into a bit of an argument with 2 friends while driving in the car to get dinner. It went a little bit like this: Me: I think LSTMs are definitely the precursor to transformers, m...

Automatic Reinforcement Unlearning

Introduction This past year has really got me deep-diving into mechanistic interpretability research. I think it makes so much sense as a computational science, is very fundamental and generalizab...

Common Spatial Pattern: Discriminator based on PCA

Principle Component Analysis (PCA) is a fundamental dimension-reduction technique that we all know to identify the top $k$ components of a set of $d-$ dimensional data points, where $k < d$. In ...

Human Writing is as Uniform as Machine Writing

Can we build a zero-shot Large Language Model (LLM) generated text detector without knowing which LLM potentially generated a given piece of text? A Stanford NLP research project (CS 224N) done i...

Multitask Learning Across Time-Series Spectrogram Tasks

Many time-series tasks share the ability to decompose (e.g. using the Fourier Transform) and express data using spectrograms. Could this, in and of itself, be a sufficient form of “shared structure...

The Gaussian's Binomial Origins (In Progress)

The Gaussian (Normal) Distribution can be found everywhere; it is also a laughably common tendency for statisticians and engineers to describe any stochastic phenomenon using Gaussians. Though we c...

Characteristics of OLS Predictor Coefficients (beta-hat)

Linear Regression (LR) is hands-down THE most useful and ubiquitous tool in statistics. Everything derives from linear regression; even the most complex statistical models at some point have to be ...

T-test: Motivation, Definition, and Derivation

Contents Contextual Problem. Defining $s^2$, a.k.a the sample variance ($\sigma^2$). Motivating the $T$ test. Deriving the $T$ statistic. 1. Contextual Problem A key objective of analysi...