Running example provided in readme on squadv2 leads to error

#1
by mweiss - opened

Context:

The Norman dynasty had a major political, cultural and military impact on medieval Europe and even the Near East. The Normans were famed for their martial spirit and eventually for their Christian piety, becoming exponents of the Catholic orthodoxy into which they assimilated. They adopted the Gallo-Romance language of the Frankish land they settled, their dialect becoming known as Norman, Normaund or Norman French, an important literary language. The Duchy of Normandy, which they formed by treaty with the French crown, was a great fief of medieval France, and under Richard I of Normandy was forged into a cohesive and formidable principality in feudal tenure. The Normans are noted both for their culture, such as their unique Romanesque architecture and musical traditions, and for their significant military accomplishments and innovations. Norman adventurers founded the Kingdom of Sicily under Roger II after conquering southern Italy on the Saracens and Byzantines, and an expedition on behalf of their duke, William the Conqueror, led to the Norman conquest of England at the Battle of Hastings in 1066. Norman cultural and military influence spread from these new European centres to the Crusader states of the Near East, where their prince Bohemond I founded the Principality of Antioch in the Levant, to Scotland and Wales in Great Britain, to Ireland, and to the coasts of north Africa and the Canary Islands.

Question:

'Who ruled the duchy of Normandy'

Stack Trace:

thread '' panicked at 'assertion failed: stride < max_len', /__w/tokenizers/tokenizers/tokenizers/src/tokenizer/encoding.rs:311:9
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace
Traceback (most recent call last):
File "/snap/pycharm-professional/290/plugins/python/helpers/pydev/pydevd.py", line 1491, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-professional/290/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/case_studies/squadv2.py", line 134, in
SQuADv2().collect_local_predictions()
File "/case_studies/squadv2.py", line 104, in collect_local_predictions
pred = qa_pipeline(inp.context, inp.context)
File "/venv38/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 250, in call
return super().call(examples[0], **kwargs)
File "/venv38/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1043, in call
return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
File "/venv38/lib/python3.8/site-packages/transformers/pipelines/base.py", line 1064, in run_single
for model_inputs in self.preprocess(inputs, **preprocess_params):
File "/venv38/lib/python3.8/site-packages/transformers/pipelines/question_answering.py", line 275, in preprocess
encoded_inputs = self.tokenizer(
File "/venv38/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2515, in call
return self.encode_plus(
File "/venv38/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2588, in encode_plus
return self._encode_plus(
File "/venv38/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 499, in _encode_plus
batched_output = self._batch_encode_plus(
File "/venv38/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 426, in _batch_encode_plus
encodings = self._tokenizer.encode_batch(
pyo3_runtime.PanicException: assertion failed: stride < max_len

Reducing the length of the context to 100chars confirms that the error is caused by the long context. Clearly, however, this is not a workaround.

Versions:
torch==1.12.0
python 3.8
transformers==4.20.1
tensorflow==2.8.0

Turns out problem was sitting in front of the keyboard ;-)

mweiss changed discussion status to closed

Sign up or log in to comment