Running on webgpu?

#2
by oanaflores - opened
  1. Any suggestions for running the model with webgpu backend?
  2. Additionally, how can I log the nodes and their assigned execution provider. Thank you in advance!
  • I've seen the model being used in the benchmarking demo https://huggingface.co/spaces/Xenova/webgpu-embedding-benchmark comparing wasm and webgpu

  • When trying to run it on webgpu (see code excerpt below), I get the following warning/error ort-wasm-simd.jsep.js: [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.

  <script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.webgpu.min.js"></script>
  <script type="module">
    import { AutoTokenizer } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.16.0';

    async function runInference() {
      const session = await ort.InferenceSession.create('models/Xenova/all-MiniLM-L6-v2/onnx/model_fp16.onnx', { executionProviders: ['webgpu'], log_severity_level: 0 });
      const tokenizer = await AutoTokenizer.from_pretrained('Xenova/all-MiniLM-L6-v2');

      const texts = ['This is an example sentence', 'Each sentence is converted']

      const { input_ids, token_type_ids, attention_mask } = tokenizer(texts, {
        padding: true,
        truncation: true,
        return_tensors: 'pt'
      });

      const feeds = {
        input_ids: new ort.Tensor('int64', input_ids.data, [1, input_ids.data.length]),
        token_type_ids: new ort.Tensor('int64', token_type_ids.data, [1, token_type_ids.data.length]),
        attention_mask: new ort.Tensor('int64', attention_mask.data, [1, attention_mask.data.length])
      };


      const result = await session.run(feeds);

...
Owner

Well, does it run? :) You should be able to ignore that warning, since I believe those are shape-related ops.

Yes, it runs! Thank you for the prompt response

Sign up or log in to comment