### Column squishing for multiclass updates

Score-based multiclass classifiers typically have the following form: x is a d-dimensional input vector (perhaps engineered features, perhaps learned features), A is a d*k matrix, where k is the number of classes, and the prediction is given by computing a score vector y=Ax. (This is a blog post, not an arxiv paper, so I'm going to be a bit fast a loose with dimension ordering.) The predicted class is then taken as argmax i y i .