Michael Crown
data scientist

GPU Accelerated Convolutional Neural Networks

Including Dropout, Momentum and L2 Regularization.

This is a fully functional Theano-based program for constructing, training, and evaluating convolutional neural networks. Available layer types are convolutional, fully-connected, and softmax (special case of fully connected), which means a network does not have to use convolution. The networks learn using stochastic gradient descent, with optional dropout, momentum, and L2 regularization.

NOTE: This is not designed to work with python versions < 3.x. Also note, I have chosen to use Greek characters for learning rate (η), momentum (μ), and L2 regularization (λ). Just a personal preference.


  • Theano based programming for GPU acceleration (Requires some setup and Nvidia GPU)
  • Choice of activation function (ReLU, softmax, tanh, sigmoid)
  • Optional max-pooling with convolution layers
  • Dropout
  • Momentum
  • L2 Regularization
  • Works with sparse matrices as input (must be specified in self.sgd() arguments as sparse=True)
  • Optional input of validation and test sets
  • Hyper-parameter tuning/evaluation
    • Returns list of doubles, containing scores paired with ordered dict of associated hypers
  • Save models (and load from save)
    • Saves parameters (bias, weight, and velocity matrices) and some metadata that can be used for reconstructing model architecture
  • Reset function (used to reset parameter values)
    • This is necessary for any instance of re-training (e.g. cross validation)
  • Optional prediction as probabilities
    • Requires output layer to be softmax type
  • K-fold CV (incomplete)
  • Prints status during training
    • Number of batches
    • Current epoch number and fraction complete
  • Prints best epoch scores for validation and test data upon completion
  • Scoring metrics currently include accuracy, precision, recall, and F1

I will be adding choices for error/loss functions soon, but it hasn't been a priority. The current function (in both FCLayer and SoftmaxLayer) is log likelihood. It is easy enough to change loss function for specific use needs if you are familiar with Theano.

There may be some disarray in the code at the moment, mostly because I have made changes on the fly for specific projects. However, it is still very fast and efficient when used with a decent Nvidia GPU.

I will post a notebook demonstrating the usage with MNIST soon, but in the mean time, here is a basic use example:

        # if convnetwork.py is not in current python path:
        # import sys
        # sys.path.append('/path/to/directory/containing/file')

        import convnetwork
        from convnetwork import ConvNet as CN
        from convnetwork import ConvLayer as cvl
        from convnetwork import FCLayer as fcl
        from convnetwork import SoftmaxLayer as sfl

        # init network with architecture defined by layers=[]
        net = CN(layers=[cvl(inpt_shape=(None, 1, 28, 28), 
                             filter_shape=(32, 1, 3, 3),
                         cvl(inpt_shape=(None, 32, 13, 13), 
                             filter_shape=(64, 32, 4, 4),
                         fcl(n_in=32*5*5, n_neurons=32, p_drop=0.2),
                         sfl(n_in=32, n_neurons=10)])

        # train the network using stochastic gradient descent
        train = (Xtr, ytr)
        val = (Xval, yval)
        test = None
        mb_size = 50
        epochs = 30
        learn_rate = 0.02
        momentum = 0.4
        l2_reg = 0.0

        net.sgd(train, val, test, mb_size, epochs, 
                learn_rate, momentum, l2_reg)

        # prints status
        # prints best scores and best epoch after training completes

        # Make a prediction from single or multiple observations: x
        # return class prediction
        # return predicted probabilities

        # reset network to re-train

        # then retrain

Get the code here