Spaces:

ggunio
/

intelligent-tokenizer-v6-demo

Sleeping

ggunio commited on Oct 6

Commit

85b3a58

verified ·

1 Parent(s): 2068c6b

Fix: Extend max_length for proper text reconstruction

Files changed (1) hide show

app.py CHANGED Viewed

@@ -105,13 +105,16 @@ class B2NLTokenizer:
             # Reconstruct (full text, not truncated)
             with torch.no_grad():
-                reconstructed = self.model.generate(text, temperature=temperature, max_length=48)
-                # For long texts, process multiple chunks
                 if text_bytes > 48:
-                    # Process with sliding window
                     full_reconstruction = reconstructed
-                    # Note: Current implementation may truncate, this is a known limitation
                 else:
                     full_reconstruction = reconstructed

             # Reconstruct (full text, not truncated)
             with torch.no_grad():
+                # Calculate appropriate max_length based on input
+                max_gen_length = max(48, min(len(text) + 10, 512))  # Allow some extra space
+                reconstructed = self.model.generate(text, temperature=temperature, max_length=max_gen_length)
+                # For long texts, ensure we get full reconstruction
                 if text_bytes > 48:
+                    # Current model limitation: may not fully reconstruct very long texts
+                    # This is due to sliding window processing
                     full_reconstruction = reconstructed
                 else:
                     full_reconstruction = reconstructed