Mohammed Hamdy

mmhamdy

AI & ML interests

NLP | Reinforcement Learning

Organizations

Posts 2

view post
Post
1239
โŒš Visiting the past with Time Machine GPT!

We are all familiar with the concept of a suite of models being a series of variants of a certain model that differ mainly in size. For example, Llama-2 7B, Llama-2 13B, Llama-2 70B

But this is not always the case. Researchers from The University of Oxford, The Alan Turing Institute, and The University of Manchester introduced TimeMachineGPT (TiMaGPT), a suite of language models that were pretrained on data constrained by a certain period in time. Instead of various sizes of the model, you get the same model but trained on different data coming from different times.

Using a GPT-2 model architecture with 117 million parameters, they trained 12 different models on Wikipedia and WMT News from 2011 to 2022 with each year represented by a model. For example, TiMaGPT-2011, TiMaGPT-2012, ..., TiMaGPT-2022.

๐Ÿค” But how could these models be useful?

They can be very useful. For example:

1๏ธโƒฃ Most language models are static in the sense that they are trapped in the time bubble of their pretraining data, their knowledge is limited by the cut-off date of their training dataset. In order to update their knowledge, Temporal Adaptation can be performed, which means further training on newer data. The TiMaGPT series of models can be used to study the limitations of Temporal Adaptation of language models.

2๏ธโƒฃ Word meaning can change not only with its context but also with its time of use and there is a large amount of research that focuses on understanding how embeddings shift through time. TiMaGPT will be very helpful in studying this phenomenon.

3๏ธโƒฃ One more use case in the context of Time-series forecasting and event prediction is "backtesting". Which is using historical data to evaluate new models for forecasting the future. Models like TiMaGPT (each living in its own time without any knowledge of the future/present) will be great for such a use case.

๐Ÿค— All models and datasets are on the hub: https://huggingface.co/Ti-Ma
view post
Post
1738
Prompting BERT!

Zero-shot learning ability is the hottest thing about causal LLMs. You don't need to finetune causal LLMs on each specific task. Instead, you can use prompting and get a decent performance on unseen tasks.

Unfortunately, autoencoding LLMs - like our dear friend BERT ๐Ÿ™‹โ€โ™‚๏ธ- lack this ability and you need a task-specific head for different tasks. But what if you could prompt all the BERTs in the world?!

๐Ÿฅ Introducing Statement-Tuning ๐Ÿฅ

Now hold your horses! don't go full-LLama on it yet. Using this finetuning approach, we can get zero-shot performance from encoders by turning a problem into a yes/no problem. Binary classification all the way down!
For example, a single entailment problem will be decomposed into 3 yes/no questions.

This is still not super useful. But I like works that try to make a little more space for encoders in the current autoregressive era!

Check the paper if interested: Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning (2404.12897)