Defending Against Distillation-Based Model Stealing Attacks

Yılmaz, Eda

Göster/Aç

Yüksek Lisans Tezi Dosyası (4.837Mb)

Tarih

2024

Yazar

Yılmaz, Eda

Ambargo Süresi

Acik erisim

Üst veri

Tüm öğe kaydını göster

Özet

Knowledge Distillation (KD) allows a complex teacher network to pass on its skills to a simpler student network, improving the student's accuracy. However, KD can also be used in model theft, where adversaries try to copy the teacher network's performance. Influenced by the "Stingy Teacher" model, recent research has shown that sparse outputs can greatly reduce the student model's effectiveness and prevent model theft. This work, using the CIFAR10, CIFAR100, and Tiny-Imagenet datasets, presents a way to train a teacher that protects its outputs, inspired by the "Nasty Teacher" concept, to prevent intellectual property theft. To enhance the teacher's defenses, this method mixes sparse outputs from adversarial images with original training data. Additionaly, a new loss function, the Exponential Predictive Divergence (EPD) loss, is introduced to hide the model's outputs without reducing accuracy. This method effectively reduces the EPD loss between the model's responses to adversarial and clean images, allowing the creation of adversarial logits without harming the network's performance.

Bağlantı

https://hdl.handle.net/11655/35991

Koleksiyonlar

Bilgisayar Mühendisliği Bölümü Tez Koleksiyonu [267]