In a recent paper, I have described how a deep learning network can be structured to behave in the same way as (a component of) the models of particle production that are used for proton-proton collisions at the Large Hadron Collider. The basic idea of a parton shower is that a single parton – that’s a quark or gluon – can be turned into a jet of many partons by repeatedly splitting every parton into two partons. If you start with one parton, you can turn that into a large number of partons by following some simple splitting rules. You can imagine this as a bit like a tree, where you start at the trunk and work your way up towards the branches, and eventually end up on small twigs. I’ve mainly used that analogy as an excuse to include the pretty picture of the trees that from above look a bit like jets of particles.
This way of thinking about the world, that everything can be made by repeating the same structures over and over at different scales, is very appealing, and it turns out to be involved both in fundamental physics models, via the renormalisation group, and in deep learning methods, which tend to learn by extracting hierarchies of details from their input data. People have even shown that deep learning models are employing the same sort of structures as fundamental physics models (https://arxiv.org/abs/1410.3831). I find that quite spooky given that our own brains, the things that interpret the world and invent physics models, are probably also deep learning machines. In some sense, our best, most fundamental models of the world are a re-expression of how our own brains learn features from the world. Weird.
All this raises the interesting question of whether we can make a deep learning model that directly incorporates the features we know should be there in particle production. Obviously we can, because I wouldn’t be writing this blog otherwise. With recent python tools, it doesn’t even take much code to implement a decent stab at a parton shower – you can (more or less) do it in 100 lines of python. Take a look at the schematic of the convolutional neural network below
This is an autoencoder. The idea is that it has to be able to predict its own input (which will be images of particle emission patterns), but there is a bottleneck in the middle of the network so the amount of information available to store each event is less than you started out with when the event was fed in. This means the network has to learn features – which could be shower splitting kernels – from the input data and use them to compress the input patterns.
Since the input is an image of the radiation pattern, the kernels that will be learnt are convolutions. We also want recursiveness, which ensures the self similarity over different angular scales. The structure to compress and learn from the data is thus to pass the input image to a bank of convolutional filters, then downsample the output of those convolutions, then pass the output through the same bank of features. Keep doing this until the image is fully compressed – the data will be stored entirely in the learned convolutions. From this compressed bottleneck, we re-inflate the image using another repeated bank of (different) deconvolutions. There are some other tricks in the model, which I may describe in more detail in a future post. You can see the set of python calls to create this model here, but I’ll describe some of the basic features.
First, this function defines the hyper-parameters of the model we’re building:
def buildModel(nPixels=64, kernelSize=2, nConvolutions=9, lr=5.e-5, lossWeights=[100,10,1], kernel_regularisation=100., nGPUs=2, merge_shower=False): nLevels = math.log(nPixels, kernelSize) - 1
The interesting parameters here are the number of pixels across the input image (64), the size of the convolutional kernel (2), and the number of convolutional kernels to use in the filter bank. Since each downsampling of the image is by a factor of the kernel size, this implies that the number of times we can pass our input through the convolution-downsample cycle is , and since we don’t want to quite compress things to a single pixel, we reduce that by 1. I’ve labelled that as nLevels above, to signify the number of convolutional levels.
Then we define our pixel array inputs, and set up the banks of convolutional and deconvolutional filters
input = Input(shape=(nPixels, nPixels, 1, )) conv= deconv =  for i in range(nConvolutions): conv.append(Conv2D(filters=1, kernel_size=kernelSize, padding='same', kernel_initializer='glorot_normal', use_bias=False, kernel_regularizer=utils.regularisers.energyConservation(kernel_regularisation))) deconv.append(Conv2DTranspose(filters=1, kernel_size=kernelSize, strides=kernelSize, padding='same', kernel_initializer='glorot_normal', use_bias=False)) shrink = MaxPooling2D(pool_size=kernelSize)
Pay attention to the padding and stride options in the filter instantiation. These ensure the output of the filter is the correct dimension. The shrink layer is defined to down-sample the image by taking the maximum pixel in a window the same size as the kernel. We only need one such layer, since it can be reused.
It helps to define a short function to apply the convolutions. Let’s define it
def applyConvolutions(events, convolutions): outputs =  for c in convolutions: filtered = c(events) outputs.append(filtered) maxOutput = keras.layers.maximum(outputs) maskInput = keras.layers.concatenate(outputs, axis=3) maskOutput = utils.layers.FilterMask()(maskInput) return maxOutput, maskOutput
This uses Keras’ maximum layer to take the largest filter output at each pixel site. It also constructs a custom masking layer that will be used to pick which of the later deconvolutional filters to use at each pixel site. Then we just apply our filters and downsampling nLevels times to compress the event
for i in range(nLevels): if merge_shower: preShower.append(events) events, mask = applyConvolutions(events, conv) events = shrink(events) filterMasks.append(mask)
The option about merge_shower is for CKKW-like merging, which I may explain in a future post. The main point to takeaway is that the input events are repeatedly convolved using the same filters and shrunk after each filtering.
Then we do the reverse and run a loop that repeatedly applies the deconvolutional filters until we are back at the original image size
for i in range(nLevels): events = applyDeconvolutions(events, deconv, filterMasks[-1-i]) if merge_shower: events = showerMerge[i]([preShower[-i-1], events])
Again the showerMerge stuff is related to merging the shower model with a fixed order calculation, which I don’t want to get into now. Suffice to say there is a custom Keras layer that does the merging, but only during evaluation time, not during model training. The applyDeconvolutions function is pretty similar to the applyConvolutions function
def applyDeconvolutions(events, deconvolutions, mask): applied =  for d in deconvolutions: applied.append(d(events)) events = keras.layers.concatenate(applied, axis=3) events = keras.layers.multiply([events, mask]) events = utils.layers.MergeDeconvolution()(events)
The only difference being it multiplies the output of the set of filters by a masking tensor, which controls which deconvolutional filter will produce output at each pixel site. This mask was determined earlier during the applyConvolutions function call. The final MergeDeconvolutions layer simply sums over the masked output of the filters.
Finally we can simply compile our autoencoding model, ready to train. I’ve had access to a pair of GPUs, so we’ll compile for multi_gpu setups, if possible.
if nGPUs > 0: with tf.device('/cpu:0'): cpumodel = Model(inputs=input, outputs=events) model = multi_gpu_model(cpumodel, gpus=nGPUs) else: model = Model(inputs=input, outputs=events) model.compile(loss=[utils.losses.activeLoss(lossWeights)], optimizer=Nadam(lr=lr))
And that’s it. The basic design of a model that recreates a parton shower is pretty simple, though some of the custom layers and losses require a bit more work