عجفت الغور
clustering
ml
NVDM
- varational autoencoder with BOW inputs
- Word order is ignored, only word counts matter
- L1 normalized counts: word probability distribution
- pretrained LM, fine tuned on some tasks
- compute loss with MSE or contrastive
- contrastive learning gives anchor, and a negative one that’s dissimilar
RoBERTa
MPNet
- Combines both approaches of permuted language models and masked language models
- Sequence is permuted and last tokens are masked
MiniLM
- distillation
- Teacher model teaches a student model
- all-mpnet-base-v2 teacher
- all-miniLM-l6-v2 is 5 times faster
Benchmark Dataset
Multi-News
- summary dataset that’s has custom human written summaries
Metrics
Accuracy
- Cosine similarity
- each news story should be closer to its summary than any other summary
- Use AUC to determine how good is the classifier
Speed
- NVDM
- is NVDM actually fast? Tested on batch sizes
- NVDM is actually not that fast, for small batch sizes is pretty slow, only catches up much later