bliutech/nlp-pdf-malware-detection
ECE 188: Computer Security. Repository for "NLP-based Malware Detection on PDFs". Utilizing NLP techniques & transformer models to perform malware detection in PDFs.
Performs static analysis by converting PDFs to byte-string sequences with variable n-gram tokenization and one-hot encoding, then feeds them into a fine-tuned transformer model to achieve 96.67% classification accuracy. The approach targets JavaScript-based PDF exploits without requiring dynamic execution, leveraging transformer attention mechanisms for parallel processing of large malware datasets. Includes full preprocessing pipeline (split, CSV generation, training, validation) and integrates with the CIC-Evasive-PDFMal2022 benchmark dataset.
No commits in the last 6 months.
Stars
33
Forks
3
Language
Python
License
—
Category
Last pushed
Dec 03, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/bliutech/nlp-pdf-malware-detection"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.