Questions
- RoBERTa uses a byte-level byte-pair encoding tokenizer. (True/False)
- A trained Hugging Face tokenizer produces
merges.txt
andvocab.json
. (True/False) - RoBERTa does not use token type IDs. (True/False)
- DistilBERT has 6 layers and 12 heads. (True/False)
- A transformer model with 80 million parameters is enormous. (True/False)
- We cannot train a tokenizer. (True/False)
- A BERT-like model has 6 decoder layers. (True/False)
- Masked language modeling predicts a word contained in a mask token in a sentence. (True/False)
- A BERT-like model has no self-attention sub-layers. (True/False)
- Data collators are helpful for backpropagation. (True/False)