Certifying Robustness of Neural Networks

Certifying Robustness of Neural Networks

Project for the course “Reliable and Interpretable Artificial Intelligence (263-2400-00L)” with Noman Ahmed Sheikh under Prof. Dr. Martin Vechev.

Introduction

We implement a precise, sound and scalable automated verifier for proving robustness of neural networks with rectified linear activations (ReLU) against adversarial attacks based on L ball perturbations. The verifier uses Zonotope abstraction to soundly approximate the behavior of the network as precisely as possible.

Overview

The zonotope domain is exact for affine transformations but loses precision while approximating ReLU functions. The ReLU transformer is parameterized by the slope λ and is sound for any value λ ∈ [0, 1]. Heuristically, the value of λ is chosen as λ = ui/(ui-li) as it minimizes the area of output zonotope (li and ui are lower and upper bounds of ith neuron). However, there may exist transformers with different λ, which can verify more precisely. The choice of optimal lambda should be a function of input image and L ball radius rather than minimizing area.

We propose gradient learning based approach to find optimal λ for each neuron in the network. The objective is to make the existing implementation more precise. We create a network with custom layers that propagates the abstract Zonotope of the input image through a given network. The input zonotope to the network is an L ball of radius ε around the input x. The network has λ as the parameters learnt using gradient descent. The weights of the original network are freezed. An example network is shown in the figure at the top. Here, the input zonotope has 784 error terms.

Below is a list of propositions and improvements that were implemented in the course of developing our verifier.

  • A custom loss function which helps to learn λ’s aiming to certify robustness to the extent possible while being sound.
  • Projecting slopes to lie between 0 and 1 during training.
  • Computing the loss exactly using simple subtraction (which is affine and exact for zonotope domain) rather than directly using interval domain.
  • Adaptive learning rate.

AI2: Safety and Robustness, Certification of Neural Networks with Abstract Interpretation
Fast and Effective Robustness Certification
Report

comments powered by Disqus