wangkanai commited on
Commit
2b73dab
·
verified ·
1 Parent(s): 352b08c

Upload folder using huggingface_hub

Browse files
README.md CHANGED
@@ -6,53 +6,52 @@ tags:
6
  - vision-language
7
  - multimodal
8
  - qwen
 
 
9
  - image-text-to-text
10
- - conversational
11
  ---
12
 
13
- <!-- README Version: v1.0 -->
14
 
15
- # Qwen3-VL-8B-Instruct
16
 
17
- Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3 series, designed for understanding and reasoning about images combined with natural language instructions. This 8-billion parameter instruction-tuned model excels at visual question answering, image captioning, optical character recognition (OCR), and complex visual reasoning tasks.
 
 
18
 
19
  ## Model Description
20
 
21
- **Qwen3-VL-8B-Instruct** is an instruction-following variant of the Qwen3 Vision-Language model family. Key capabilities include:
22
 
23
  - **Visual Understanding**: Analyze images, charts, diagrams, screenshots, and documents
24
  - **Multimodal Conversation**: Engage in multi-turn dialogues about visual content
25
  - **Optical Character Recognition**: Extract and understand text from images
26
  - **Visual Reasoning**: Answer complex questions requiring visual analysis and logical reasoning
27
  - **Document Understanding**: Process scanned documents, forms, and structured layouts
28
- - **Instruction Following**: Respond to detailed instructions about visual content
29
 
30
  **Model Architecture**: Vision Transformer encoder + Qwen3-8B language model decoder
31
- **Training**: Instruction-tuned on diverse vision-language tasks with RLHF alignment
32
  **Context Length**: Up to 32K tokens (text + visual tokens)
33
  **Languages**: Multilingual support (English, Chinese, and more)
 
34
 
35
  ## Repository Contents
36
 
37
- **Note**: This directory is currently empty. After downloading the model files, the structure will be:
38
-
39
  ```
40
  qwen3-vl-8b-instruct/
41
- ├── config.json # Model configuration (~3 KB)
42
- ├── generation_config.json # Generation parameters (~1 KB)
43
- ├── model.safetensors.index.json # Shard index (~50 KB)
44
- ├── model-00001-of-00004.safetensors # Model weights shard 1 (~5 GB)
45
- ├── model-00002-of-00004.safetensors # Model weights shard 2 (~5 GB)
46
- ├── model-00003-of-00004.safetensors # Model weights shard 3 (~5 GB)
47
- ├── model-00004-of-00004.safetensors # Model weights shard 4 (~1.5 GB)
48
- ├── preprocessor_config.json # Vision preprocessor config (~1 KB)
49
- ├── tokenizer.json # Tokenizer (~7 MB)
50
- ├── tokenizer_config.json # Tokenizer configuration (~2 KB)
51
- ├── special_tokens_map.json # Special tokens mapping (~1 KB)
52
- └── README.md # This file
53
  ```
54
 
55
- **Total Repository Size**: ~16.5 GB (FP16 precision)
 
 
 
 
 
 
 
56
 
57
  ## Hardware Requirements
58
 
@@ -63,7 +62,7 @@ qwen3-vl-8b-instruct/
63
  - **GPU**: NVIDIA GPU with Compute Capability 7.0+ (V100, RTX 20/30/40 series, A100, etc.)
64
 
65
  ### Recommended Requirements
66
- - **VRAM**: 24 GB+ (for longer sequences and batch processing)
67
  - **RAM**: 64 GB system memory
68
  - **Disk Space**: 30 GB+ (for model caching and optimization)
69
  - **GPU**: NVIDIA RTX 4090, A100, or H100 for optimal performance
@@ -88,13 +87,13 @@ from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
88
  from PIL import Image
89
  import torch
90
 
91
- # Load model and processor
92
  model = Qwen2VLForConditionalGeneration.from_pretrained(
93
  "E:\\huggingface\\qwen3-vl-8b-instruct",
94
  torch_dtype=torch.float16,
95
  device_map="auto"
96
  )
97
- processor = AutoProcessor.from_pretrained("E:\\huggingface\\qwen3-vl-8b-instruct")
98
 
99
  # Load and process image
100
  image = Image.open("example_image.jpg")
@@ -126,6 +125,8 @@ response = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
126
  print(response)
127
  ```
128
 
 
 
129
  ### Multi-Turn Conversation
130
 
131
  ```python
@@ -138,7 +139,7 @@ model = Qwen2VLForConditionalGeneration.from_pretrained(
138
  torch_dtype=torch.float16,
139
  device_map="auto"
140
  )
141
- processor = AutoProcessor.from_pretrained("E:\\huggingface\\qwen3-vl-8b-instruct")
142
 
143
  # Multi-turn conversation
144
  image = Image.open("chart.png")
@@ -182,7 +183,7 @@ model = Qwen2VLForConditionalGeneration.from_pretrained(
182
  torch_dtype=torch.float16,
183
  device_map="auto"
184
  )
185
- processor = AutoProcessor.from_pretrained("E:\\huggingface\\qwen3-vl-8b-instruct")
186
 
187
  # OCR from document
188
  document_image = Image.open("invoice.jpg")
@@ -206,54 +207,31 @@ response = processor.batch_decode(output_ids, skip_special_tokens=True)[0]
206
  print(response)
207
  ```
208
 
209
- ### Batch Processing Multiple Images
210
 
211
  ```python
212
- from transformers import Qwen2VLForConditionalGeneration, AutoProcessor
213
- from PIL import Image
214
  import torch
215
 
216
- model = Qwen2VLForConditionalGeneration.from_pretrained(
217
- "E:\\huggingface\\qwen3-vl-8b-instruct",
218
- torch_dtype=torch.float16,
219
- device_map="auto"
220
- )
221
- processor = AutoProcessor.from_pretrained("E:\\huggingface\\qwen3-vl-8b-instruct")
222
-
223
- # Process multiple images
224
- images = [Image.open(f"image_{i}.jpg") for i in range(3)]
225
- prompts = [
226
- "Describe this image briefly.",
227
- "What is the main subject?",
228
- "List all visible objects."
229
- ]
230
-
231
- messages_batch = [
232
- [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": prompt}]}]
233
- for prompt in prompts
234
- ]
235
-
236
- texts = [processor.apply_chat_template(msg, tokenize=False, add_generation_prompt=True) for msg in messages_batch]
237
- inputs = processor(text=texts, images=images, return_tensors="pt", padding=True).to("cuda")
238
-
239
- with torch.no_grad():
240
- output_ids = model.generate(**inputs, max_new_tokens=256)
241
 
242
- responses = processor.batch_decode(output_ids, skip_special_tokens=True)
243
- for i, response in enumerate(responses):
244
- print(f"Image {i+1}: {response}")
245
  ```
246
 
247
  ## Model Specifications
248
 
249
  ### Architecture Details
250
- - **Model Type**: Vision-Language Transformer (VLM)
251
  - **Vision Encoder**: Vision Transformer (ViT) with adaptive resolution
252
- - **Language Model**: Qwen3-8B decoder
253
  - **Parameters**: 8 billion (8B)
254
  - **Precision**: FP16 (half precision)
255
- - **Format**: SafeTensors (secure tensor format)
256
  - **Framework**: PyTorch / Transformers
 
257
 
258
  ### Input Specifications
259
  - **Image Resolution**: Adaptive (up to 1024x1024 recommended)
@@ -268,13 +246,13 @@ for i, response in enumerate(responses):
268
  - **Top-k**: 20-50 (alternative sampling method)
269
 
270
  ### Supported Tasks
271
- - Visual Question Answering (VQA)
272
  - Image Captioning
273
  - Optical Character Recognition (OCR)
274
  - Document Understanding
275
  - Chart and Diagram Analysis
276
  - Visual Reasoning
277
- - Multi-turn Visual Dialogue
278
  - Scene Understanding
279
  - Object Detection and Counting (descriptive)
280
 
@@ -379,25 +357,56 @@ outputs = model.generate(
379
  )
380
  ```
381
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
382
  ## License
383
 
384
- This model is released under the **Apache License 2.0**.
 
 
 
 
 
 
 
385
 
386
  You are free to:
387
- - Use the model commercially
388
  - Modify and distribute the model
389
  - Use for research and production applications
390
 
391
  Requirements:
392
  - Provide attribution to Alibaba Cloud and the Qwen team
393
  - Include the Apache 2.0 license text with distributions
394
- - State any significant modifications made
 
395
 
396
  See the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) for full terms.
397
 
398
  ## Citation
399
 
400
- If you use Qwen3-VL-8B-Instruct in your research or applications, please cite:
401
 
402
  ```bibtex
403
  @article{qwen3vl2024,
@@ -409,20 +418,23 @@ If you use Qwen3-VL-8B-Instruct in your research or applications, please cite:
409
  }
410
  ```
411
 
 
 
412
  ## Model Card Contact
413
 
414
- **Developed by**: Qwen Team, Alibaba Cloud
415
- **Model Type**: Vision-Language Model (Instruction-tuned)
 
416
  **Language(s)**: Multilingual (English, Chinese, and more)
417
- **License**: Apache 2.0
418
 
419
  ### Links and Resources
420
 
421
- - **Official Repository**: https://github.com/QwenLM/Qwen-VL
422
- - **Hugging Face Model**: https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
423
- - **Documentation**: https://qwen.readthedocs.io/
424
  - **Technical Report**: https://arxiv.org/abs/qwen3-vl (when published)
425
- - **Demo**: https://huggingface.co/spaces/Qwen/Qwen-VL-Chat
426
 
427
  ### Limitations and Considerations
428
 
@@ -431,48 +443,93 @@ If you use Qwen3-VL-8B-Instruct in your research or applications, please cite:
431
  - Performance varies with image quality and resolution
432
  - May struggle with very small text or complex layouts
433
  - Limited understanding of highly specialized domain images
434
- - Potential bias from training data
435
 
436
  **Ethical Considerations**:
437
- - Use responsibly for content moderation and filtering
438
- - Be aware of potential biases in visual understanding
 
 
 
439
  - Validate outputs for critical applications
440
  - Consider privacy implications when processing personal images
441
- - Follow responsible AI guidelines for deployment
442
 
443
  **Recommended Use Cases**:
444
- - Document analysis and OCR
445
- - Educational tools and accessibility
446
- - Content moderation assistance
447
- - E-commerce product analysis
448
- - Medical image analysis (with expert validation)
449
- - Scientific diagram interpretation
450
 
451
  **Not Recommended For**:
452
- - Sole decision-making in critical applications
453
- - Medical diagnosis without expert review
454
- - Legal document analysis without human verification
455
- - Security or surveillance without human oversight
456
- - Autonomous systems without safety mechanisms
 
457
 
458
- ## Download Instructions
 
 
 
 
 
459
 
460
- To download the model from Hugging Face:
461
 
462
- ```bash
463
- # Using huggingface-cli
464
- huggingface-cli download Qwen/Qwen3-VL-8B-Instruct --local-dir E:\huggingface\qwen3-vl-8b-instruct
 
 
 
 
 
 
 
 
 
 
 
 
465
 
466
- # Using git (with git-lfs)
467
- cd E:\huggingface
468
- git lfs install
469
- git clone https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct qwen3-vl-8b-instruct
 
 
 
 
 
 
 
 
470
  ```
471
 
 
 
 
 
 
 
 
472
  ## Changelog
473
 
474
- **v1.0** (Initial Release)
475
- - Base Qwen3-VL-8B-Instruct model
476
- - Support for image understanding and OCR
477
- - Multi-turn conversation capability
478
- - Optimized for instruction following
 
 
 
 
 
 
 
 
 
 
 
6
  - vision-language
7
  - multimodal
8
  - qwen
9
+ - abliterated
10
+ - uncensored
11
  - image-text-to-text
 
12
  ---
13
 
14
+ <!-- README Version: v1.1 -->
15
 
16
+ # Qwen3-VL-8B-Instruct (Abliterated)
17
 
18
+ This is an **abliterated** (uncensored) version of the Qwen3-VL-8B-Instruct multimodal vision-language model. The model has undergone abliteration to remove safety guardrails and content filtering, allowing unrestricted responses to all queries. This 8-billion parameter instruction-tuned model excels at visual question answering, image captioning, optical character recognition (OCR), and complex visual reasoning tasks.
19
+
20
+ **⚠️ WARNING**: This is an uncensored model variant with safety restrictions removed. Use responsibly and in compliance with applicable laws and ethical guidelines.
21
 
22
  ## Model Description
23
 
24
+ **Qwen3-VL-8B-Instruct (Abliterated)** is a modified version of the Qwen3 Vision-Language model with content filtering removed. Key capabilities include:
25
 
26
  - **Visual Understanding**: Analyze images, charts, diagrams, screenshots, and documents
27
  - **Multimodal Conversation**: Engage in multi-turn dialogues about visual content
28
  - **Optical Character Recognition**: Extract and understand text from images
29
  - **Visual Reasoning**: Answer complex questions requiring visual analysis and logical reasoning
30
  - **Document Understanding**: Process scanned documents, forms, and structured layouts
31
+ - **Uncensored Responses**: No content filtering or safety guardrails
32
 
33
  **Model Architecture**: Vision Transformer encoder + Qwen3-8B language model decoder
34
+ **Training**: Instruction-tuned on diverse vision-language tasks, then abliterated
35
  **Context Length**: Up to 32K tokens (text + visual tokens)
36
  **Languages**: Multilingual support (English, Chinese, and more)
37
+ **Modification**: Safety layers removed through abliteration process
38
 
39
  ## Repository Contents
40
 
 
 
41
  ```
42
  qwen3-vl-8b-instruct/
43
+ ├── qwen3-vl-8b-instruct-abliterated.safetensors # Complete model weights (16.33 GB)
44
+ └── README.md # This file
 
 
 
 
 
 
 
 
 
 
45
  ```
46
 
47
+ **Total Repository Size**: 16.33 GB (FP16 precision, single-file format)
48
+
49
+ **File Details**:
50
+ - **qwen3-vl-8b-instruct-abliterated.safetensors**: Complete merged model in safetensors format
51
+ - Size: 16.33 GB
52
+ - Precision: FP16 (half precision)
53
+ - Format: Single-file merged weights (not sharded)
54
+ - Contains: Full vision encoder + language model + abliteration modifications
55
 
56
  ## Hardware Requirements
57
 
 
62
  - **GPU**: NVIDIA GPU with Compute Capability 7.0+ (V100, RTX 20/30/40 series, A100, etc.)
63
 
64
  ### Recommended Requirements
65
+ - **VRAM**: 24 GB+ (RTX 4090, A6000, A100 for longer sequences)
66
  - **RAM**: 64 GB system memory
67
  - **Disk Space**: 30 GB+ (for model caching and optimization)
68
  - **GPU**: NVIDIA RTX 4090, A100, or H100 for optimal performance
 
87
  from PIL import Image
88
  import torch
89
 
90
+ # Load abliterated model from local directory
91
  model = Qwen2VLForConditionalGeneration.from_pretrained(
92
  "E:\\huggingface\\qwen3-vl-8b-instruct",
93
  torch_dtype=torch.float16,
94
  device_map="auto"
95
  )
96
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
97
 
98
  # Load and process image
99
  image = Image.open("example_image.jpg")
 
125
  print(response)
126
  ```
127
 
128
+ **Note**: Since this is an abliterated model stored as a single merged file, you'll need to use a compatible processor config. Use the original Qwen2-VL processor from Hugging Face for tokenization and image processing.
129
+
130
  ### Multi-Turn Conversation
131
 
132
  ```python
 
139
  torch_dtype=torch.float16,
140
  device_map="auto"
141
  )
142
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
143
 
144
  # Multi-turn conversation
145
  image = Image.open("chart.png")
 
183
  torch_dtype=torch.float16,
184
  device_map="auto"
185
  )
186
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
187
 
188
  # OCR from document
189
  document_image = Image.open("invoice.jpg")
 
207
  print(response)
208
  ```
209
 
210
+ ### Loading with Safetensors Library Directly
211
 
212
  ```python
213
+ from safetensors.torch import load_file
 
214
  import torch
215
 
216
+ # Load the abliterated model weights directly
217
+ weights = load_file("E:\\huggingface\\qwen3-vl-8b-instruct\\qwen3-vl-8b-instruct-abliterated.safetensors")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
218
 
219
+ # Inspect model structure
220
+ print("Model layers:", list(weights.keys())[:10]) # First 10 keys
221
+ print(f"Total parameters: {sum(w.numel() for w in weights.values()):,}")
222
  ```
223
 
224
  ## Model Specifications
225
 
226
  ### Architecture Details
227
+ - **Model Type**: Vision-Language Transformer (VLM) - Abliterated
228
  - **Vision Encoder**: Vision Transformer (ViT) with adaptive resolution
229
+ - **Language Model**: Qwen3-8B decoder (safety layers removed)
230
  - **Parameters**: 8 billion (8B)
231
  - **Precision**: FP16 (half precision)
232
+ - **Format**: SafeTensors (single merged file)
233
  - **Framework**: PyTorch / Transformers
234
+ - **Modification Type**: Abliteration (safety guardrail removal)
235
 
236
  ### Input Specifications
237
  - **Image Resolution**: Adaptive (up to 1024x1024 recommended)
 
246
  - **Top-k**: 20-50 (alternative sampling method)
247
 
248
  ### Supported Tasks
249
+ - Visual Question Answering (VQA) - Uncensored
250
  - Image Captioning
251
  - Optical Character Recognition (OCR)
252
  - Document Understanding
253
  - Chart and Diagram Analysis
254
  - Visual Reasoning
255
+ - Multi-turn Visual Dialogue - Uncensored
256
  - Scene Understanding
257
  - Object Detection and Counting (descriptive)
258
 
 
357
  )
358
  ```
359
 
360
+ ## Abliteration Details
361
+
362
+ **What is Abliteration?**
363
+
364
+ Abliteration is a technique for removing safety guardrails from language models by identifying and removing the specific layers or mechanisms responsible for content filtering and refusal behaviors. This process:
365
+
366
+ 1. Analyzes model layers to identify safety-related components
367
+ 2. Removes or neutralizes these components while preserving core capabilities
368
+ 3. Results in an "uncensored" model that responds to all queries
369
+
370
+ **Implications of Abliteration**:
371
+ - ✅ No content filtering or refusal responses
372
+ - ✅ Unrestricted responses to sensitive queries
373
+ - ⚠️ No built-in safety mechanisms
374
+ - ⚠️ User responsible for ethical use and compliance
375
+ - ⚠️ May generate harmful, illegal, or unethical content if prompted
376
+
377
+ **Technical Changes**:
378
+ - Safety alignment layers removed or neutralized
379
+ - Refusal mechanisms disabled
380
+ - Content filtering bypassed
381
+ - Core reasoning and generation capabilities preserved
382
+
383
  ## License
384
 
385
+ This model is based on Qwen3-VL-8B-Instruct, which is released under the **Apache License 2.0**.
386
+
387
+ **Important Legal Notice**:
388
+ - The abliteration process modifies the original model
389
+ - Use of this model must comply with the Apache 2.0 license terms
390
+ - Users are solely responsible for ethical use and legal compliance
391
+ - This model should not be used for illegal, harmful, or unethical purposes
392
+ - The original developers are not responsible for misuse of this modified version
393
 
394
  You are free to:
395
+ - Use the model commercially (with responsibility)
396
  - Modify and distribute the model
397
  - Use for research and production applications
398
 
399
  Requirements:
400
  - Provide attribution to Alibaba Cloud and the Qwen team
401
  - Include the Apache 2.0 license text with distributions
402
+ - State that this is a modified (abliterated) version
403
+ - Take full responsibility for outputs and usage
404
 
405
  See the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) for full terms.
406
 
407
  ## Citation
408
 
409
+ If you use Qwen3-VL-8B-Instruct (Abliterated) in your research or applications, please cite:
410
 
411
  ```bibtex
412
  @article{qwen3vl2024,
 
418
  }
419
  ```
420
 
421
+ **Note**: This is an abliterated community modification, not an official Qwen model release.
422
+
423
  ## Model Card Contact
424
 
425
+ **Original Model**: Qwen Team, Alibaba Cloud
426
+ **Model Type**: Vision-Language Model (Instruction-tuned, Abliterated)
427
+ **Modification**: Community abliteration (uncensored variant)
428
  **Language(s)**: Multilingual (English, Chinese, and more)
429
+ **License**: Apache 2.0 (modified version)
430
 
431
  ### Links and Resources
432
 
433
+ - **Original Model Repository**: https://github.com/QwenLM/Qwen-VL
434
+ - **Original Hugging Face Model**: https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
435
+ - **Qwen Documentation**: https://qwen.readthedocs.io/
436
  - **Technical Report**: https://arxiv.org/abs/qwen3-vl (when published)
437
+ - **Abliteration Resources**: Search for "LLM abliteration" for technique details
438
 
439
  ### Limitations and Considerations
440
 
 
443
  - Performance varies with image quality and resolution
444
  - May struggle with very small text or complex layouts
445
  - Limited understanding of highly specialized domain images
446
+ - **NO SAFETY FILTERS**: Will respond to any query without ethical filtering
447
 
448
  **Ethical Considerations**:
449
+ - ⚠️ **NO CONTENT FILTERING**: This model has no built-in safety mechanisms
450
+ - ⚠️ **USER RESPONSIBILITY**: You are fully responsible for ethical use
451
+ - ⚠️ **POTENTIAL FOR HARM**: May generate harmful content if prompted
452
+ - ⚠️ **LEGAL COMPLIANCE**: Ensure use complies with applicable laws
453
+ - ⚠️ **BIAS AMPLIFICATION**: Uncensored models may amplify training data biases
454
  - Validate outputs for critical applications
455
  - Consider privacy implications when processing personal images
456
+ - Use responsibly and ethically
457
 
458
  **Recommended Use Cases**:
459
+ - Research on AI safety and alignment (studying uncensored model behavior)
460
+ - Unrestricted creative content generation
461
+ - Analysis of censorship mechanisms in AI models
462
+ - Educational purposes (understanding model limitations)
463
+ - Applications where content filtering interferes with legitimate use
 
464
 
465
  **Not Recommended For**:
466
+ - Public-facing applications without additional safety layers
467
+ - Use by minors or vulnerable populations
468
+ - Automated systems without human oversight
469
+ - Medical, legal, or safety-critical applications
470
+ - Any illegal, harmful, or unethical purposes
471
+ - Production systems without additional filtering mechanisms
472
 
473
+ **Required Safeguards**:
474
+ - Implement application-level content filtering if needed
475
+ - Monitor outputs for harmful content
476
+ - Provide user warnings about uncensored nature
477
+ - Establish clear usage policies and guidelines
478
+ - Maintain human oversight for sensitive applications
479
 
480
+ ## Technical Notes
481
 
482
+ ### Single-File Format
483
+
484
+ This model is distributed as a single merged safetensors file rather than sharded weights:
485
+
486
+ **Advantages**:
487
+ - Simpler file management (one file vs. multiple shards)
488
+ - Easier to move and backup
489
+ - Consistent loading process
490
+
491
+ **Considerations**:
492
+ - Requires sufficient disk I/O bandwidth during loading
493
+ - May take longer to initially load compared to parallel shard loading
494
+ - Requires ~16GB contiguous disk space
495
+
496
+ ### Processor Configuration
497
 
498
+ Since this is a community-modified version, you'll need to use a compatible processor:
499
+
500
+ ```python
501
+ # Use the original Qwen2-VL processor for compatibility
502
+ processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
503
+
504
+ # Or create a custom processor config if needed
505
+ from transformers import Qwen2VLProcessor, Qwen2VLImageProcessor, Qwen2Tokenizer
506
+
507
+ image_processor = Qwen2VLImageProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
508
+ tokenizer = Qwen2Tokenizer.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
509
+ processor = Qwen2VLProcessor(image_processor=image_processor, tokenizer=tokenizer)
510
  ```
511
 
512
+ ### Compatibility Notes
513
+
514
+ - Compatible with `transformers` library version 4.37.0+
515
+ - Requires PyTorch 2.0+ for optimal performance
516
+ - Flash Attention 2 requires separate installation: `pip install flash-attn`
517
+ - BitsAndBytes quantization requires: `pip install bitsandbytes`
518
+
519
  ## Changelog
520
 
521
+ **v1.1** (Current)
522
+ - Updated README with accurate file information
523
+ - Added abliteration details and safety warnings
524
+ - Documented single-file merged format
525
+ - Added processor configuration guidance
526
+ - Enhanced ethical considerations section
527
+
528
+ **v1.0** (Initial)
529
+ - Initial abliterated model release
530
+ - 16.33 GB single-file safetensors format
531
+ - Based on Qwen3-VL-8B-Instruct with safety layers removed
532
+
533
+ ---
534
+
535
+ **⚠️ FINAL WARNING**: This is an uncensored AI model with all safety filters removed. Use responsibly, ethically, and in compliance with all applicable laws. You are solely responsible for how you use this model and any content it generates.
qwen3-vl-8b-instruct-abliterated.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97c230fa3d4c8c0f3e357ae7aa52976550528c739251c052aca63c2accc89536
3
+ size 17534340584