عجفت الغور

clustering

NVDM

varational autoencoder with BOW inputs
Word order is ignored, only word counts matter
L1 normalized counts: word probability distribution

Transformer

pretrained LM, fine tuned on some tasks
compute loss with MSE or contrastive
- contrastive learning gives anchor, and a negative one that’s dissimilar

RoBERTa

masked langugage model

MPNet

Combines both approaches of permuted language models and masked language models
Sequence is permuted and last tokens are masked

MiniLM

distillation
Teacher model teaches a student model
all-mpnet-base-v2 teacher
all-miniLM-l6-v2 is 5 times faster

Benchmark Dataset

Multi-News

summary dataset that’s has custom human written summaries

Metrics

Accuracy

Cosine similarity
each news story should be closer to its summary than any other summary
Use AUC to determine how good is the classifier

Speed

NVDM
- is NVDM actually fast? Tested on batch sizes
- NVDM is actually not that fast, for small batch sizes is pretty slow, only catches up much later