Privacy Filter Nigeria: A context-aware domain adapter for Nigerian identity systems
OpenRay.ai
PROJECT 01 / NAIJA PRIVACY FILTER

Teaching privacy models to understand Nigerian data.

A LoRA adapter on OpenAI's Privacy Filter base model that adds five Nigerian identity-document labels — NIN, BVN, passport, driver's license, and voter's card — on top of the eight generic categories the base model already covers, and adapts the rest of the taxonomy for Nigerian formats.

NAIJA PRIVACY FILTER
LoRA / 13 LABELS / TEST F1 0.964
PUBLISHED ON HUGGINGFACE 2026

Why a Nigerian adapter?

OpenAI's Privacy Filter performs context-aware detection on unstructured text — reading the surrounding language and entity co-occurrence to type each span — so two numbers with identical formats can resolve to different labels depending on how each is referenced. Heuristic detectors can't make that call.

Banks, fintechs, lending, and identity-verification workflows generate PII-heavy text — KYC notes, compliance reviews, fraud-investigation logs, OCR output from NIMC slips and bank statements, customer-service tickets quoting NUBAN account numbers and phone numbers in local formats.

A generic model catches the easy cases but tends to miss orover-collapse the locally important entities — losing information you need for retention, access control, sharing decisions, and audit.

Input text

“My name is Ciroma John. I live at No. 11 Yaba street Ikeja. Review completed with no further action required at this stage; the record was checked against 22443690465 and 34488606925 for consistency with the supporting documentation.”

Base privacy-filter
JSON
{
  "entities": [
    { "entity_group": "private_person",  "word": "Ciroma John",              "score": 1.0 },
    { "entity_group": "private_address", "word": "No. 11 Yaba street Ikeja", "score": 1.0 },
    { "entity_group": "account_number",  "word": "22443690465",              "score": 1.0 },
    { "entity_group": "account_number",  "word": "34488606925",              "score": 1.0 }
  ]
}
With Nigeria adapter
JSON
{
  "entities": [
    { "entity_group": "private_person",  "word": "Ciroma John",              "score": 1.00 },
    { "entity_group": "private_address", "word": "No. 11 Yaba street Ikeja", "score": 0.998 },
    { "entity_group": "private_bvn",     "word": "22443690465",              "score": 0.991 },
    { "entity_group": "private_nin",     "word": "34488606925",              "score": 1.00 }
  ]
}

The base model labels both 11-digit numbers as a generic account_number. The adapter resolves them as a BVN and a NIN — the labels you actually need.

Under the Nigeria Data Protection Act 2023, they sit in different regulatory regimes with different handling expectations for retention, access control, sharing decisions, and audit. A pipeline that collapses them into one label loses the information needed to apply those rules correctly.

What it detects

Entity LabelDescriptionExample Match
private_personPerson names and name-like referencesAmina Yusuf
private_phoneNigerian local and international phone formats+234 802 111 3344
private_emailEmail addressesamina.yusuf@example.ng
private_addressStreet, city, state, or postal addresses42 Unity Road, Ikeja, Lagos 100271
private_dateDates tied to a person, record, or event12 April 1988
private_bvnBank Verification Number references and values22334455667
private_ninNational Identification Number references and values12345678901
account_numberNUBAN-style Nigerian bank account numbers6318826391
private_passport_numberNigerian passport identifiersB05995318
private_drivers_license_numberNigerian driver license identifiersK2BHY7F6FEA0
private_voters_card_numberNigerian voter card identifiersABCD 1234 5678 9012 345
private_urlURLs tied to private records or workflowshttps://claims.example/record/1234
secretCredentials, authorization codes, session tokensS3cure!9037Ops

Performance

The adapter was evaluated on a private mixed dataset combining synthetic examples, OCR-derived samples from Nigerian identity documents, and real-world domain text. Sensitive fields in the source data were annotated for evaluation and then redacted before any further use; source materials and derived artifacts are not redistributed.

Recall-oriented (v0.1 research preview)

Validation and test results on this dataset are strong, but the adapter is intentionally recall-oriented. The hard-negative challenge split — text containing benign identifier-like numbers such as invoice IDs and internal reference codes — shows the model still over-redacts in some cases. For precision-sensitive use, downstream users should add filters, tune thresholds, or finetune further on their own representative data.

0.964
Test F1
0.959
Test Precision
0.969
Test Recall
0.976
Validation F1

Challenge split is hard-negative-only — 0.72 false-positive example rate across 250 examples. Typed F1 is not a meaningful metric on this split.

How to use

CLI Usage

Sync deps and run the published Naija LoRA adapter on top of OpenAI's Privacy Filter base model.

BASH
uv sync

uv run python main.py \
  --model-name openai/privacy-filter \
  --adapter-name iamSamurai/privacy-filter-nigeria \
  "My name is Harry Potter and my email is harry.potter@hogwarts.edu."

REST API

Serve predictions over HTTP. The base model and adapter are env-driven so you can swap checkpoints without code changes.

BASH
PRIVACY_FILTER_MODEL_NAME=openai/privacy-filter \
PRIVACY_FILTER_ADAPTER_NAME=iamSamurai/privacy-filter-nigeria \
uv run uvicorn api:app --reload
BASH
curl -X POST http://127.0.0.1:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"text":"Amina Yusuf can be reached at +234 802 111 3344.","mode":"cleaned"}'

Future Work

More local formats support

Include additional local formats that should be represented.

Per-label evaluation

Track per-label precision, recall, F1, boundary accuracy, and false-positive rates on numeric IDs and generic locations as separate signals — not a single F1 number.

Expanded OOD eval

Extend the private mixed eval with broader OOD coverage — mixed naming conventions, non-Lagos addresses, code-switched text, and multi-ID records — and publish a reproducible OOD slice.

Hybrid postprocessing

Pair the model with regex and recognizer postprocessing for deterministic entity types (emails, URLs, phone numbers, structured IDs) rather than relying on the model alone.

Privacy & Security2026
Author
Narcisse Egonu