beehive-lab/GPULlama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
Leverages TornadoVM's JIT compilation to automatically translate native Java tensor operations to OpenCL or NVIDIA PTX, supporting multiple model architectures (Llama3, Mistral, Qwen, Phi, Granite) in GGUF format. Integrates as an official model provider in LangChain4j and Quarkus, enabling GPU-accelerated inference within existing Java AI frameworks without additional glue code.
238 stars.
Stars
238
Forks
28
Language
Java
License
MIT
Category
Last pushed
Mar 11, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/beehive-lab/GPULlama3.java"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
srgtuszy/llama-cpp-swift
Swift bindings for llama-cpp library
gitkaz/mlx_gguf_server
This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously...
JackZeng0208/llama.cpp-android-tutorial
llama.cpp tutorial on Android phone
dougeeai/llama-cpp-python-wheels
Pre-built wheels for llama-cpp-python across platforms and CUDA versions
RhinoDevel/mt_llm
Pure C wrapper library to use llama.cpp with Linux and Windows as simple as possible.