This is a high-level description of a simplified self-attention implementation of a transformer.
The query is simply a token in a sequence being evaluated by the transformer.
In the simplified version we could simply do a dot product of the query to each other input token. The dot product gives an approximation of similarity, though there are better ways to calculate similarity (like Cosine Similarity or even a Scaled Dot Product).
Use something like SoftMax to normalize the attention scores so they all add up to 1.
Use the attention weights to calculate the context vector which is the sum of each of the inputs multiplied by the corresponding attention weight.