zhihu/cuBERT
Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL
Eliminates TensorFlow framework overhead through direct kernel implementation, supporting mixed-precision inference (fp16 computation with fp32 accuracy) on Volta/Turing GPUs for 2x speedup. Provides thread-safe serving via `BertM` wrapper with configurable request and operation-level parallelism, plus language bindings for Python (Cython) and Java (JNA) alongside the core C++ API.
549 stars. No commits in the last 6 months.
Stars
549
Forks
84
Language
C++
License
MIT
Category
Last pushed
Nov 18, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/zhihu/cuBERT"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related frameworks
dimitreOliveira/bert-as-a-service_TFX
End-to-end pipeline with TFX to train and deploy a BERT model for sentiment analysis.
ThalesGroup/ConceptBERT
Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering
Kvasirs/MILES
MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification...
kpi6research/Bert-as-a-Library
Bert as a Library is a Tensorflow library for quick and easy training and finetuning of models...
Statistical-Impossibility/Feline-Project
Domain-adaptive NLP pipeline for feline veterinary NER using BERT