LLMs + Vulnerability-Lookup: What We’re Testing and Where We’re Headed

LLMs + Vulnerability-Lookup: What We’re Testing and Where We’re Headed

February 26, 2025

 #AI#NLP#LLM#Text-Generation#Text-Classification#Datasets#HuggingFace

Everyone’s talking about AI, NLP and LLMs these days, and, to be honest, so are we!

Recently, we’ve been exploring how LLMs can help us make sense of the massive amount of vulnerability data we collect and improve vulnerability management—while always remembering that AI is just a tool, not a solution on its own!

The picture below gives a glimpse of what we’ve tested so far.

Architecture of Our Current Experiments

With the various vulnerability feeders of Vulnerability-Lookup (for the CVE Program, NVD, Fraunhofer FKIE, GHSA, PySec, CSAF sources, Japan Vulnerability Database, etc.) we’ve collected over a million JSON records! This allow us to generate datasets for training and building models! 🚀

During our explorations, we realized that we can automatically update a BERT-based text classification model daily using a dataset of approximately 450k rows from Vulnerability-Lookup. With powerful GPUs, it’s a matter of hours.

As always, we aim to support the community, which is why all components of this experiment are open source:

We have plenty of ideas to go further (the missing step 5 in the picture), including:

  • Guessing CPE names using a different approach than CPE Guesser
  • Affected Product/Category Classification
  • Vulnerability Type Classification, using CWE mappings
  • Automated Threat Intelligence Tagging, using existing ATT&CK mappings
  • Estimating exploitability based on available data in vulnerabilities
  • Generating models for the Japan Vulnerability database or for the feed CSAF NCSC-NL

The list is long and we welcome your ideas!

Demonstration

For a quick demo, check out our vulnerability classification space:

👉 https://huggingface.co/spaces/CIRCL/vulnerability-severity-classification

This is a demo of our text classification model with a mapping on CVSS scores. It is a fine tuned model based on distilbert-base-uncased, trained with our hardware (2X GPUs NVIDIA L40).
We will test various BERT-based model, and of course RoBERTa.

If your interested in text generation, we’ve trained a larger model to assist in writing vulnerability descriptions, using GPT-2 as base. You can find it on Hugging Face, along with usage information.

For a real usage, we recommend running it locally with Ollama and Open WebUI.

Contributions

If you’re interested in contributing with your ideas, join us in Luxembourg on April 8-9, 2025:

👉 https://hackathon.lu/

Examples

On Hugging Face

Below, a couple of examples from the space on Hugging Face.

Tenable

Palo Alto

MISP

With Python

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "CIRCL/vulnerability-severity-classification-distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

# Example test input
test_description = "app/Model/Attribute.php in MISP before 2.4.198 ignores an ACL during a GUI attribute search."
inputs = tokenizer(test_description, return_tensors="pt", truncation=True, padding=True)

# Run inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Print results
print("Predictions:", predictions)

Acknowledgments

This work is made possible thanks to funding from Europe through the NGSOTI project. And of course thanks to all the contributors!