Token Classification
dino
vision
t1anwx commited on
Commit
c4a78f5
·
1 Parent(s): d5b051e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -68
README.md CHANGED
@@ -1,71 +1,7 @@
1
- ---
2
- license: apache-2.0
3
- tags:
4
- - dino
5
- - vision
6
- datasets:
7
- - imagenet-1k
8
- ---
9
 
10
- # Vision Transformer (small-sized model, patch size 8) trained using DINO
11
 
12
- Vision Transfor你好你呢扭扭捏捏呢你呢你呢你呢你duced in the paper [Emerging Properties in Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.14294) by Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin and first released in [this repository](https://github.com/facebookresearch/dino).
13
 
14
- Disclaimer: The team releasing DINO did not write a model card for this model so this model card has been written by the Hugging Face team.
15
-
16
- ## Model description
17
-
18
- The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-1k, at a resolution of 224x224 pixels.
19
-
20
- Images are presented to the model as a sequence of fixed-size patches (resolution 8x8), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
21
-
22
- Note that this model does not include any fine-tuned heads.
23
-
24
- By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image.
25
-
26
- ## Intended uses & limitations
27
-
28
- You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=google/vit) to look for
29
- fine-tuned versions on a task that interests you.
30
-
31
- ### How to use
32
-
33
- Here is how to use this model:
34
-
35
-
36
- from transformers import ViTImageProcessor, ViTModel
37
- from PIL import Image
38
- import requests
39
-
40
- url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
41
- image = Image.open(requests.get(url, stream=True).raw)
42
-
43
- processor = ViTImageProcessor.from_pretrained('facebook/dino-vits8')
44
- model = ViTModel.from_pretrained('facebook/dino-vits8')
45
-
46
- inputs = processor(images=image, return_tensors="pt")
47
- outputs = model(**inputs)
48
- last_hidden_states = outputs.last_hidden_state
49
- 、##ion info
50
-
51
- ```bibtex443333
52
- @article{DBLP:journals/corr/abs-2104-14294,
53
- author = {Mathilde Caron and
54
- Hugo Touvron and
55
- Ishan Misra and
56
- Herv{\'{e}} J{\'{e}}gou and
57
- Julien Mairal and
58
- Piotr Bojanowski and
59
- Armand Joulin},
60
- title = {Emerging Properties in Self-Supervised Vision Transformers},
61
- journal = {CoRR},
62
- volume = {abs/2104.14294},
63
- year = {2021},
64
- url = {https://arxiv.org/abs/2104.14294},
65
- archivePrefix = {arXiv},
66
- eprint = {2104.14294},
67
- timestamp = {Tue, 04 May 2021 15:12:43 +0200},
68
- biburl = {https://dblp.org/rec/journals/corr/abs-2104-14294.bib},
69
- bibsource = {dblp computer science bibliography, https://dblp.org}
70
- }
71
- ```
 
1
+ VIDIA DRIVE AGX 是一个可扩展的开放式自动驾驶汽车计算平台,充当自动驾驶汽车的大脑。作为同类产品中硬件平台的佼佼者,NVIDIA DRIVE AGX 为功能安全的人工智能自动驾 驶提供高性能、高能效的计算。硬件方面,NVIDIA DRIVE 嵌入式超级计算平台处理来自摄像头、普通雷达和激光雷达传 感器的数据,以感知周围环境、在地图上确定汽车的位置,然后规划并执行安全的行车路线。软件方面,NVIDIA DRIVE AGX 具备可扩展和软件定义特性,平台能够提供先进的性能, 助力自动驾驶汽车处理大量传感器数据,并做出实时驾驶决策。开放式 NVIDIA DRIVE 软件 栈还可帮助开发者使用冗余和多样化的深度神经网络 (DNN),构建感知、建图、规划和驾驶 员监控功能。通过持续迭代和无线更新,该平台变得日益强大。同时,开放式 NVIDIA DRIVE SDK 为开发者提供了自动驾驶所需的所有构建块和算法堆栈。该软件有助于开发者更高效地 构建和部署各种先进的自动驾驶应用程序,包括感知、定位和地图绘制、计划和控制、驾驶 员监控和自然语言处理。本文将分几个章节以当前应用最为广泛的主流英伟达芯片 Orin x 为例,分别从硬件和软 件两个方向来说明如何进行从软到硬件级别的开发和应用。
 
 
 
 
 
 
 
2
 
3
+ 1 英伟达内部架构设计
4
 
5
+ Orin-x 为例,其中的 CPU 包括基于 Arm Cortex-A78AE 的主CPU 复合体,它提供通用高速计算能力;以及基于 Arm Cortex-R52 的功能安全岛(FSI),它提供了隔离的片上计算资源, 减少了对外部 ASIL D 功能安全 CPU 处理的需求。
6
 
7
+ GPU 则是 NVIDIA®Ampere GPU,为 CUDA 语言提供高级并行处理计算能力,并支持多种工具, TensorRT,一种深度学习推理优化器和运行时,可提供低延迟和高吞吐量。Ampere