A collection of models that are able to be run using onnxruntime-genai and can be served through embeddedllm library.
			
	
	AI & ML interests
None defined yet.
Recent Activity
	View all activity
	
			Organization Card
		
		EmbeddedLLM
About EmbeddedLLM
EmbeddedLLM is an open-source company dedicated to advancing the field of Large Language Models (LLMs) through innovative backend solutions and hardware optimizations. Our mission is to make powerful generative models work on all platforms, from edge to private cloud, ensuring accessibility and efficiency for a wide range of applications.
Highlighted Repositories
- Description: JamAI Base is an open-source RAG (Retrieval-Augmented Generation) backend platform that integrates an embedded database (SQLite) and an embedded vector database (LanceDB) with managed memory and RAG capabilities. It features built-in LLM, vector embeddings, and reranker orchestration and management, all accessible through a convenient, intuitive, spreadsheet-like UI and a simple REST API.
 - Key Features:
- Embedded database (SQLite) and vector database (LanceDB)
 - Managed memory and RAG capabilities
 - Built-in LLM, vector embeddings, and reranker orchestration
 - Intuitive spreadsheet-like UI
 - Simple REST API
 
 
- Description: This repository is a port of vLLM for AMD GPUs, providing a high-throughput and memory-efficient inference and serving engine for LLMs optimized for ROCm.
 - Key Features:
- Vision Language Models support
 - New features not yet available in the upstream
 - Optimized for AMD GPUs with ROCm support
 
 
- Description: It is a AIPC embedded LLM Engine unifying and provide stable way to run LLM fast on CPU, iGPU, GPU. It supports launching OpenAI-API-Compatible API server powered by our engine.
 - Key Features:
- Supported hardwares: CPU (ONNX), AMD iGPU (ONNX-DirectML), Intel iGPU (IPEX-LLM, OpenVINO), Intel XPU (IPEX-LLM, OpenVINO), Nvidia GPU (ONNX-CUDA).
 - Provide prebuilt, ready-to-run Windows 11 executable.
 - Vision Language Models support (CPU)
 
 
Join Us
We invite you to explore our repositories and models, contribute to our projects, and join us in pushing the boundaries of what's possible with LLMs.
Model Powered by Onnxruntime DirectML GenAI
			
	
	- 
	
	
	
				EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml
Text Generation • Updated - 
	
	
	
				EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-directml
Text Generation • Updated • 2 - 
	
	
	
				EmbeddedLLM/Phi-3-medium-4k-instruct-onnx-directml
Text Generation • Updated • 3 - 
	
	
	
				EmbeddedLLM/Phi-3-medium-128k-instruct-onnx-directml
Text Generation • Updated 
A collection of models that are able to be run using onnxruntime-genai and can be served through embeddedllm library.
			
	
	Model Powered by Onnxruntime DirectML GenAI
			
	
	- 
	
	
	
				EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml
Text Generation • Updated - 
	
	
	
				EmbeddedLLM/Phi-3-mini-128k-instruct-onnx-directml
Text Generation • Updated • 2 - 
	
	
	
				EmbeddedLLM/Phi-3-medium-4k-instruct-onnx-directml
Text Generation • Updated • 3 - 
	
	
	
				EmbeddedLLM/Phi-3-medium-128k-instruct-onnx-directml
Text Generation • Updated 
			models
			100
		
			
	
	
	
	
	EmbeddedLLM/Qwen3-VL-235B-A22B-Instruct-FP8-PTPC-Quark
		
				236B
			• 
	
				Updated
					
				
				• 
					
					15
				
	
				
				
EmbeddedLLM/Qwen3-Coder-480B-A35B-Instruct-FP8-Dynamic
		
				480B
			• 
	
				Updated
					
				
				• 
					
					352
				
	
				
				
EmbeddedLLM/deepseek-r1-FP8-Dynamic
		
				671B
			• 
	
				Updated
					
				
				• 
					
					1.07k
				
	
				
				
EmbeddedLLM/Qwen2.5-1.5B-FP8-Dynamic
		
				2B
			• 
	
				Updated
					
				
				
				
	
				
				
EmbeddedLLM/Qwen2.5-1.5B-Instruct-FP8-Dynamic
		
				2B
			• 
	
				Updated
					
				
				• 
					
					2
				
	
				
				
EmbeddedLLM/Qwen2.5-32B-Instruct-FP8-Dynamic
		
				33B
			• 
	
				Updated
					
				
				• 
					
					3
				
	
				
				
EmbeddedLLM/Qwen2.5-7B-Instruct-FP8-Dynamic
		
				8B
			• 
	
				Updated
					
				
				• 
					
					3
				
	
				
				
EmbeddedLLM/deepseekv3-lite-ci
		
	
				Updated
					
				
				
				
	
				
				
EmbeddedLLM/Qwen_Qwen2.5-32B-Instruct-FP8-Dynamic
		
				33B
			• 
	
				Updated
					
				
				• 
					
					2
				
	
				
				
EmbeddedLLM/Llama-3.1-8B-Instruct-w_fp8_per_channel_sym
			Text Generation
			• 
		
				8B
			• 
	
				Updated
					
				
				• 
					
					3
				
	
				
				
			datasets
			0
		
			
	None public yet