t1anwx
/

sspaas

Token Classification

dino

vision

Model card Files Files and versions Community

t1anwx commited on Jun 3, 2023

Commit

c4a78f5

1 Parent(s): d5b051e

Update README.md

Browse files

Files changed (1) hide show

README.md +4 -68

README.md CHANGED Viewed

@@ -1,71 +1,7 @@
----
-license: apache-2.0
-tags:
-- dino
-- vision
-datasets:
-- imagenet-1k
----
-# Vision Transformer (small-sized model, patch size 8) trained using DINO
-Vision Transfor你好你呢扭扭捏捏呢你呢你呢你呢你duced in the paper [Emerging Properties in Self-Supervised Vision Transformers](https://arxiv.org/abs/2104.14294) by Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin and first released in [this repository](https://github.com/facebookresearch/dino).
-Disclaimer: The team releasing DINO did not write a model card for this model so this model card has been written by the Hugging Face team.
-## Model description
-The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a self-supervised fashion, namely ImageNet-1k, at a resolution of 224x224 pixels.
-Images are presented to the model as a sequence of fixed-size patches (resolution 8x8), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
-Note that this model does not include any fine-tuned heads.
-By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image.
-## Intended uses & limitations
-You can use the raw model for image classification. See the [model hub](https://huggingface.co/models?search=google/vit) to look for
-fine-tuned versions on a task that interests you.
-### How to use
-Here is how to use this model:
-from transformers import ViTImageProcessor, ViTModel
-from PIL import Image
-import requests
-url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
-image = Image.open(requests.get(url, stream=True).raw)
-processor = ViTImageProcessor.from_pretrained('facebook/dino-vits8')
-model = ViTModel.from_pretrained('facebook/dino-vits8')
-inputs = processor(images=image, return_tensors="pt")
-outputs = model(**inputs)
-last_hidden_states = outputs.last_hidden_state
-、##ion info
-```bibtex443333
-@article{DBLP:journals/corr/abs-2104-14294,
-  author    = {Mathilde Caron and
-               Hugo Touvron and
-               Ishan Misra and
-               Herv{\'{e}} J{\'{e}}gou and
-               Julien Mairal and
-               Piotr Bojanowski and
-               Armand Joulin},
-  title     = {Emerging Properties in Self-Supervised Vision Transformers},
-  journal   = {CoRR},
-  volume    = {abs/2104.14294},
-  year      = {2021},
-  url       = {https://arxiv.org/abs/2104.14294},
-  archivePrefix = {arXiv},
-  eprint    = {2104.14294},
-  timestamp = {Tue, 04 May 2021 15:12:43 +0200},
-  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-14294.bib},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-```

+VIDIA DRIVE AGX 是一个可扩展的开放式自动驾驶汽车计算平台，充当自动驾驶汽车的大脑。作为同类产品中硬件平台的佼佼者，NVIDIA DRIVE AGX 为功能安全的人工智能自动驾 驶提供高性能、高能效的计算。硬件方面，NVIDIA DRIVE 嵌入式超级计算平台处理来自摄像头、普通雷达和激光雷达传 感器的数据，以感知周围环境、在地图上确定汽车的位置，然后规划并执行安全的行车路线。软件方面，NVIDIA DRIVE AGX 具备可扩展和软件定义特性，平台能够提供先进的性能， 助力自动驾驶汽车处理大量传感器数据，并做出实时驾驶决策。开放式 NVIDIA DRIVE 软件 栈还可帮助开发者使用冗余和多样化的深度神经网络 (DNN)，构建感知、建图、规划和驾驶 员监控功能。通过持续迭代和无线更新，该平台变得日益强大。同时，开放式 NVIDIA DRIVE SDK 为开发者提供了自动驾驶所需的所有构建块和算法堆栈。该软件有助于开发者更高效地 构建和部署各种先进的自动驾驶应用程序，包括感知、定位和地图绘制、计划和控制、驾驶 员监控和自然语言处理。本文将分几个章节以当前应用最为广泛的主流英伟达芯片 Orin x 为例，分别从硬件和软 件两个方向来说明如何进行从软到硬件级别的开发和应用。
+1 英伟达内部架构设计
+以 Orin-x 为例，其中的 CPU 包括基于 Arm Cortex-A78AE 的主CPU 复合体，它提供通用高速计算能力；以及基于 Arm Cortex-R52 的功能安全岛（FSI），它提供了隔离的片上计算资源， 减少了对外部 ASIL D 功能安全 CPU 处理的需求。
+GPU 则是 NVIDIA®Ampere GPU，为 CUDA 语言提供高级并行处理计算能力，并支持多种工具， 如 TensorRT，一种深度学习推理优化器和运行时，可提供低延迟和高吞吐量。Ampere 还