Autoencoder represents a multi-dimensional smooth function
Setting up a simple Autoencoder neural network to reproduce a dataset obtained by sampling a multi-dimensional smooth function f=f(x_1,x_2,x_3,x_4). As an example I’m using a disc+halo rotation curve model where both components are described by 2-parameters circular velocities
neural_network
autoencoder
basics
jupyter
Author
Lorenzo Posti
Published
June 10, 2022
Autoencoder: an algorithm to learn a representation of the dataset
In this piece I’m interested in a neural network that is not designed to predict an outcome given a dataset, but rather in an algorithm that is capable of learning an underlying - more compressed - representation of the dataset at hand. This can be used for a number of excitng applications such as data compression, latent representation or recovering the true data distribution. An autoencoder, which is a special encoder/decoder network, is an efficient algorithm that can achieve this. It is divided in two parts: - encoder: the algorithm learns a simple (lower dimensional) representation of the data (code) - decoder: starting from an instance of such representation, code, the algorithm develops it returning the data in the orignal (higher dimensional) form.
The Autoencoder is then a map $ x x$, where the first part is the encoder and the second is the decoder; thus it is effectively a map of \(x\) onto itself. This implies 1) that we can use the data itself \(x\) in the loss function and 2) that the algorithm is learning a representation of the dataset itself.
A smooth multi-dimensional function: rotation curve model
To test an autoencoder network we’ll use a simple rotation curve model that is the superposition of two massive components, a disc and an halo. Both components have circular velocity curves that depend on two parameters, a mass and a physical scale, such that the total model has 4 parameters, 2 for each component.
This is a nice case to test how an autoencoder learns an underlying lower dimensional representation of a dataset, since each rotation curve that we will supply to it is actually derived sampling a smooth function of just 4 parameters.
Sampling uniformly in disc mass, between \(10^9\) and \(10^{12}\) solar masses, and generating random samples of disc size, halo mass, and halo concentration following the above scaling relations.
Above we have generated our latent representation of the dataset, that is each galaxy model is represented by a quadruple (ms,rd,mh,cc). Below we construct a class to generate the rotation curve of the corresponding galaxy model.
class curveMod():def__init__(self, Md, Rd, Mh, cc, rad=np.logspace(-1, np.log10(50), 50)):self.G, self.H, self.Dc =4.301e-9, 70, 200.# physical constantsself.Md, self.Rd = Md, Rdself.Mh, self.cc = Mh, ccself.rad = radifhasattr(self.Md, '__len__'):self.vdisc = [self._vdisc(self.rad, self.Md[i], self.Rd[i]) for i inrange(len(self.Md))]self.vdm = [self._vhalo(self.rad, self.Mh[i], self.cc[i]) for i inrange(len(self.Md))]self.vc = [np.sqrt(self.vdisc[i]**2+self.vdm[i]**2) for i inrange(len(self.Md))]else:self.vdisc =self._vdisc(self.rad, self.Md, self.Rd)self.vdm =self._vhalo(self.rad, self.Mh, self.cc)self.vc = np.sqrt(self.vdisc**2+self.vdm**2)def _fc(self, x): return np.log(1+x)-x/(1+x)def _Vvir(self, Mh): return np.sqrt((self.Dc*(self.H)**2/2)**(1./3.) * (self.G*Mh)**(2./3.))def _Rvir(self, Mh): return1e3* (Mh / (0.5*self.Dc*self.H**2/self.G))**(1./3.)def _vhalo(self, R, Mh, cc):# circular velocity of the halo component (NFW model) rv =self._Rvir(Mh)return np.sqrt(self._Vvir(Mh)**2*rv/R*self._fc(cc*R/rv)/self._fc(cc)) def _vdisc(self, R, Md, Rd):# circular velocity of the disc component (exponential disc) y = R/2./Rdreturn np.nan_to_num(np.sqrt(2*4.301e-6*Md/Rd*y**2*(i0(y)*k0(y)-i1(y)*k1(y))))
We can now initialize this class with the samples (ms,rd,mh,cc) above, which will store inside the class the rotation curves of all the models - defined on the same radial scale (np.logscale(-1, np.log10(50), 50)) by default.
cm=curveMod(ms,rd,mh,cc)
Let’s plot all the rotation curves together:
for v in cm.vc: plt.plot(cm.rad, v)plt.xlabel('radius')plt.ylabel('velocity');
The Autoencoder network
We can now build our Autoncoder class deriving from nn.Module. In this example, each rotation curve is a collection of 50 values (see the radial grid in curveMod) and we want to compress it down to a code of just 4 numbers. Notice that we start from the simplified case in which we assume to already know that the ideal latent representation has 4 parameters.
The encoder network has 3 layers, going from the initial \(n=50\) to \(32\), then to \(16\), and finally to \(4\). The decoder is symmetric, going from \(n=4\) to \(16\), then to \(32\), and finally to \(50\).
After 1000 epochs of trainig we get down to a low and stable MSE on the validation set. We can now compare the actual rotation curves in the validation set with those decoded by the model, finding an impressively good match!
fig,ax = plt.subplots(figsize=(12,4), ncols=2)for v in datascale(xvalid,xmean,xstd): ax[0].plot(cm.rad, v)for v in datascale(ae.forward(xvalid),xmean,xstd): ax[1].plot(cm.rad, v.detach().numpy())ax[0].set_xlabel('radius'); ax[1].set_xlabel('radius')ax[0].set_ylabel('velocity');
How does the code correlate with the original 4 physical parameters?
Finally we can explore a bit the properties of the 4 values encoded by the autoencoder and try to understand how do they relate to the original 4 physical parameters. Initially we generated each rotation curve starting from a 4-ple in (ms, mh, rd, cc), where these numbers are generated from some well known scaling relations. We plot the distribution of the initial 4 parameters here:
Now we can ask ourselves if similar correlations are observed in the 4 parameters coded by the autoencoder. In principle these are some other 4 numbers that fully specify a single rotation curve model, but that do not necessarily have anything to do with the original (ms, mh, rd, cc).
We can see that the coded parameters are indeed strongly correlated among themselves, however it is difficult to draw parallelisms with the behaviour we see in the original 4 parameters. An important difference to notice in these plots is that here we do not use a logarithmic scale for the 4 coded parameters, since they also take negative values unlike (ms, mh, rd, cc).
Let’s now have a look at how the 4 coded parameters are correlated with the original ones. This is interesting since while the 4-dimensional latent space found by the autoencoder is not necessarily the original space of (ms, mh, rd, cc), the 4 new parameters might be well correlated with the 4 original physical quantites.
Code
mdshuff, mhshuff = [cm.Md[i] for i in idshuff], [cm.Mh[i] for i in idshuff]rdshuff, ccshuff = [cm.Rd[i] for i in idshuff], [cm.cc[i] for i in idshuff]ith =int(nsamp*(1.0-fval))mdtrain, mhtrain = mdshuff[:ith], mhshuff[:ith]rdtrain, cctrain = rdshuff[:ith], ccshuff[:ith]mdvalid, mhvalid = mdshuff[ith:], mhshuff[ith:]rdvalid, ccvalid = rdshuff[ith:], ccshuff[ith:]partrain = (np.vstack([mdtrain, mhtrain, rdtrain, cctrain]).T)parvalid = (np.vstack([mdvalid, mhvalid, rdvalid, ccvalid]).T)
Plotting the mutual correalations in the training set we do see that the 4 coded parameters are not at all randomly related to the original physical quantities.
To finish, let’s have a look at how we can use the autoencoder to generate new fake data that resembles our original dataset. We do so by random sampling from the distribution of coded values that we obtained during training. In this way we generate a new plausible code which we then decode to construct a new rotation curve.
We start by generating new code from the distribution obtained during training - to do this we use numpy.random.choice
size=500new_pp_ae = []for i inrange(4): new_pp_ae.append(tensor(np.random.choice(pp_ae[:,i].numpy(), size)))new_code = torch.stack(new_pp_ae).T
Let’s plot the original and new distributions of coded parameters:
Since they look very much alike we can now decode the new code that we just generated and we are ready to plot the new rotation curves
fig,ax = plt.subplots(figsize=(6,4))for v in datascale(ae.decoder(new_code),xmean,xstd): ax.plot(cm.rad, v.detach().numpy())ax.set_xlabel('radius')ax.set_ylabel('velocity');
With this method we have effectively generated new rotation curves that are realistic and are not part of the training dataset. This illustrates the power of autoencoders, however to do this even better we can adapt our autoencoder to learn the underlying distribution in the code space - this is what Variatioal Autoencoders (VAEs) are for!