zhihu/cuBERT

Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL

/ 100

Emerging

Eliminates TensorFlow framework overhead through direct kernel implementation, supporting mixed-precision inference (fp16 computation with fp32 accuracy) on Volta/Turing GPUs for 2x speedup. Provides thread-safe serving via `BertM` wrapper with configurable request and operation-level parallelism, plus language bindings for Python (Cython) and Java (JNA) alongside the core C++ API.

549 stars. No commits in the last 6 months.

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

549

Forks

Language

C++

License

MIT

Related frameworks

dimitreOliveira/bert-as-a-service_TFX

End-to-end pipeline with TFX to train and deploy a BERT model for sentiment analysis.

ThalesGroup/ConceptBERT

Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering

Kvasirs/MILES

MILES is a multilingual text simplifier inspired by LSBert - A BERT-based lexical simplification...

kpi6research/Bert-as-a-Library

Bert as a Library is a Tensorflow library for quick and easy training and finetuning of models...

Statistical-Impossibility/Feline-Project

Domain-adaptive NLP pipeline for feline veterinary NER using BERT

Explore ML Frameworks

All categories Trending ML Framework directory Insights