Attention Mechanism Components - Quant Trader Interview Question
Difficulty: Medium
Category: Linear Algebra & Machine Learning
Practice quant interview questions from top firms including Jane Street, Citadel, Two Sigma, DE Shaw, and other leading quantitative finance companies.
Topics: attention-mechanism, machine-learning, linear-algebra, transformer
Problem Description
In a self-attention mechanism, we have query (Q), key (K), and value (V) matrices. Given an input sequence represented as a matrix $X$, these matrices are derived through linear transformations: $Q = XW_Q$, $K = XW_K$, and $V = XW_V$, where $W_Q$, $W_K$, and $W_V$ are learnable weight matrices.
How is the output of the self-attention mechanism computed, assuming the dimension of the keys is $d_k$?
Practice this medium trader interview question on MyntBit - the all-in-one quant learning platform with 200+ quant interview questions for Jane Street, Citadel, Two Sigma, and other top quantitative finance firms.