bliutech/nlp-pdf-malware-detection

ECE 188: Computer Security. Repository for "NLP-based Malware Detection on PDFs". Utilizing NLP techniques & transformer models to perform malware detection in PDFs.

24
/ 100
Experimental

Performs static analysis by converting PDFs to byte-string sequences with variable n-gram tokenization and one-hot encoding, then feeds them into a fine-tuned transformer model to achieve 96.67% classification accuracy. The approach targets JavaScript-based PDF exploits without requiring dynamic execution, leveraging transformer attention mechanisms for parallel processing of large malware datasets. Includes full preprocessing pipeline (split, CSV generation, training, validation) and integrates with the CIC-Evasive-PDFMal2022 benchmark dataset.

No commits in the last 6 months.

No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 9 / 25

How are scores calculated?

Stars

33

Forks

3

Language

Python

License

Last pushed

Dec 03, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/bliutech/nlp-pdf-malware-detection"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.