The Best NLP Model Might Not Be Best For The Job

Written by Daniel Sim | Feb 18, 2020 11:29:00 AM

Here are some thoughts on Cathal Horan's article "When Not to Choose the Best NLP Model."

The world of NLP already contains an assortment of pre-trained models and techniques. Cathal discusses how to discern which model will work for your goals.

It examines current state-of-the-art (SOTA) models, namely:

ELMo
USE (Universal Sentence Encoder)
BERT
XLNet

It introduces different methods to evaluate those models based on the task.

A little explanation of why the models are different is also given.

They did not state which version of USE was used – there are two versions:

Deep Averaging Network (USE-DAN)
Transformer (USE-T)

The former is less accurate but more performant in longer sentences.

Another thing to note is that ELMo, while contextual, is not as deeply contextual, so says the people who created BERT. BERT is.

OpenAI’s GPT -2 is also absent from the action, and I would have liked to see it included.

There is some buzz about XLNet, but I have not read enough about it to comment on it. Other than that, it promises the ability to learn longer-term dependencies in text. Since transformer models' compute cost grows quadratically with input text length, I wonder how they handled that.

Other takeaways:

…without specific fine-tuning, it seems that BERT is not suited to finding similar sentences.

…USE is trained on a number of tasks but one of the main tasks is to identify the similarity between pairs of sentences. The authors note that the task was to identify “semantic textual similarity (STS) between sentence pairs scored by Pearson correlation with human judgments”. This would help explain why the USE is better at the similarity task.

Pre-trained models are your friend: Most of the models published now are capable of being fine-tuned, but you should use the pre-trained model to get a quick idea of its suitability.

References

When Not to Choose the Best NLP Model (Cathal Horan)
The world of NLP already contains an assortment of pre-trained models and techniques. This article discusses how to best discern which model will work for your goals.

View full post