Discussion about this post

User's avatar
Rainbow Roxy's avatar

What an insightful read! You really manage to demystify how VLMs like GPT-4o do what they do, it's truly fascinating to see the elegance behind the 'magic'. This piece on contrastive learning actualy makes me think of your explanation of embedding spaces, now it all clickes into place even better about how relationships are learned rather than rigid definitions.

Expand full comment
Neural Foundry's avatar

Fantastic breakdown! The magnet analogy for positive/negative pairs really clarified something I've been wrestling with. When I first encountered CLIP's embedding space, I dunno why but I assumed it was mostly about pulling correct pairs closer. The insight that the repulsion betwene negative pairs is equally critical makes total sense now. In practice, I've noticed zero-shot performance degrades fast when you have too many visually similar but semantically distinct categories in teh same batch, which probly traces back to that hard negatives problem you mentioned.

Expand full comment

No posts

Ready for more?