After the intro to diffusion models, let’s take them to the next step by introducing conditioning in these simplified models and let’s do it from scratch. We will build a model able to generate rotation curves with peculiar features that we can specify upfront.
neural network
diffusion
jupyter
Author
Lorenzo Posti
Published
January 17, 2023
How to generate new data with specific characteristics
In the last notebook I’ve shown how to write from scratch a simplified diffusion model that generates quite realistic rotation curves starting from random noise. This was done by training an autoencoder to denoise rotation curves from a given database and then new curves were generated by gradually denoising pure Gaussian noise.
The limitation of the pipeline above is that we have no control on what output curve will be generated, since it all stars with random noise and the model slowly enhances some random features in the noise distribution by convincing itself that it is an underlying signal. What we would like to do now is to guide the model to generate rotation curves with some specific characteristics – asymptotically flat, fast rise, slow rise, etc. – that we decide upfront. Basically, we are asking the model to focus on and enhance those bits of random noise that kind of look like the features that we are interested in, and by using the same gradual denoising procedure based on small timesteps the algorithm will be able to further refine them and to finally construct a realistic dataset with some given charateristics.
Let’s start by generating a rotation curve dataset. This step is borrowed from the blog post intro to diffusion models and I use the same methodology and the same code. However, to spice things up a bit, here I introduce another mass component to the galaxy rotation curves which is the bulge. Thus, I have a new mass–size relation for bulges that I use to sample the parameter space and I add the circular velocity of a Hernquist bulge into the calculation of the rotation curves.
Galaxies with significantly different mass and size ratios of the disc and bulge components have quite different rotation curves, with distinct recognizeable features – e.g. high bulge-fraction galaxies have rotation curves that rise very steeply, have central peak, and then decline fast, while those with low bulge fraction have a slowly rising curve. The goal of this notebook is then to train a model that can generate rotation curves that have some of these specific features that we can specify upfront.
I need to introduce two more free parameters, the mass and size of the bulge, to generate a galaxy model now. While I sample both disc and bulge masses from uniform distributions, I take the bulge size to lie on a given scaling relation, in analogy to what I’ve done for the disc component.
class curveMod():def__init__(self, Md, Rd, Mb, Rb, Mh, cc, rad=np.logspace(-1, np.log10(50), 50)):self.G, self.H, self.Dc =4.301e-9, 70, 200.# physical constantsself.Md, self.Rd = Md, Rdself.Mb, self.Rb = Mb, Rbself.Mh, self.cc = Mh, ccself.rad = radifhasattr(self.Md, '__len__'):self.vdisc = [self._vdisc(self.rad, self.Md[i], self.Rd[i]) for i inrange(len(self.Md))]self.vbulge = [self._vbulge(self.rad, self.Mb[i], self.Rb[i]) for i inrange(len(self.Md))]self.vdm = [self._vhalo(self.rad, self.Mh[i], self.cc[i]) for i inrange(len(self.Md))]self.vc = [np.sqrt(self.vdisc[i]**2+self.vbulge[i]**2+self.vdm[i]**2) for i inrange(len(self.Md))]else:self.vdisc =self._vdisc(self.rad, self.Md, self.Rd)self.vbulge =self._vbulge(self.rad, self.Mb, self.Rb)self.vdm =self._vhalo(self.rad, self.Mh, self.cc)self.vc = np.sqrt(self.vdisc**2+self.vbulge**2+self.vdm**2)def _fc(self, x): return np.log(1+x)-x/(1+x)def _Vvir(self, Mh): return np.sqrt((self.Dc*(self.H)**2/2)**(1./3.) * (self.G*Mh)**(2./3.))def _Rvir(self, Mh): return1e3* (Mh / (0.5*self.Dc*self.H**2/self.G))**(1./3.)def _vhalo(self, R, Mh, cc):# circular velocity of the halo component (NFW model) rv =self._Rvir(Mh)return np.sqrt(self._Vvir(Mh)**2*rv/R*self._fc(cc*R/rv)/self._fc(cc)) def _vdisc(self, R, Md, Rd):# circular velocity of the disc component (exponential disc) y = R/2./Rdreturn np.nan_to_num(np.sqrt(2*4.301e-6*Md/Rd*y**2*(i0(y)*k0(y)-i1(y)*k1(y))))def _vbulge(self, R, Mb, Rb):# circular velocity of the bulge component (Hernquist bulge)return np.sqrt(G*1e3*Mb*R)/(R+Rb)
cm=curveMod(md,rd,mb,rb,mh,cc)
Finally we can plot some of these rotation curves:
for i,v inenumerate(cm.vc): if i%10==0: plt.plot(cm.rad, v)plt.xlabel('radius')plt.ylabel('velocity');
Even though it does not appear clearly from the plot above, the rotation curves in this database are much more varied, and have shapes that are wildly more different, with respect to the database without bulges that I used in the previous intro post on diffusion models.
Normalization and train/valid splitting
As always, it is important to normalize and shuffle the data and to create a traning/validation split.
To include conditioning in the autoencoder network I do not necessarily need to vary a lot the model and I can get away with just some minimal modifications. This is not exactly what state-of-the-art diffusion models really do (using CLIPs and embeddings), but it is identical in spirit and it is much simpler to write and understand.
The basic idea is to pass as input of the autoencoder not only the thing we want to reproduce (in this case the rotation curve tensor) but also a tensor of properties of the input. This can be a simple array of numbers, as in this case, or a more complicated embedding layer, but the principle is that the autoencoder is given a priori some additional information that further characterize the input. Once the model is trained, it has learned to recognize these additional properties, thus when it will have to generate a new input from pure noise it will be able to use this additional tensor of properties to condition and guide its denoising process.
Conditioning tensors
I select four properties that I use as conditioning: the bulge over stellar mass ratio, the stellar over halo mass ratio, the disc over bulge size ratio, and the stellar mass. While the latter specifies the absolute scale of the galaxy, the other three are relative parameters that are able to constrain the shape of the rotation curve relatively well.
This tensor will be concatenated to the input, i.e. the rotation curve tensor, to feed the autoencoder.
Model
class AE_cond_net(nn.Module):def__init__(self, ninp, ncond=4, **kwargs):super().__init__()self.encoder_layers = nn.ModuleList([ nn.Linear(in_features=ninp+ncond, out_features=32), nn.Linear(in_features=32, out_features=16), nn.Linear(in_features=16, out_features=6) ])self.decoder_layers = nn.ModuleList([ nn.Linear(in_features=6, out_features=16), nn.Linear(in_features=16, out_features=32), nn.Linear(in_features=32, out_features=ninp) ])self.act = nn.SiLU() # The activation functiondef forward(self, x, cond): h = [] # skip connectionsfor i, l inenumerate(self.encoder_layers): x =self.act(l(x)) if i>0elseself.act(l(torch.cat((x, cond), 1))) # collate x and cond in the 1st layerif i <2: h.append(x) # store for skip connection, for all but final layerfor i, l inenumerate(self.decoder_layers):if i >0: x += h.pop() # get stored output for skip connection, for all but first layer x =self.act(l(x)) if i<2else l(x) # final layer without activationreturn x
This model is quite similar to the unconditioned autoencoder that I used for the previous blog post, with just the first layer being different as now it takes four more inputs.
model = AE_cond_net(len(cm.rad))
# Adam and MSE Lossloss_func = nn.MSELoss(reduction='mean')optimizer = torch.optim.Adam(model.parameters(), lr=0.01)for epoch inrange(2001):# generate random noise amount noise = torch.rand(xtrain.shape[0])# add noise to data x_noisy = add_noise(xtrain, noise)# prediction ymod = model.forward(x_noisy, cond_train)# loss loss = loss_func(xtrain, ymod) optimizer.zero_grad() loss.backward() optimizer.step()if epoch%100==0: print(epoch, "train L:%1.2e"% loss, " valid L:%1.2e"% loss_func(xvalid, model.forward(xvalid, cond_valid)))
Now that the model has trained let’s use the sampling technique that we developed for the previous notebook to generate some new data. This is the pipeline that gradually denoises an input that is fully Gaussian random noise, but this time we can ask the model to condition this denoising procedure with a feature vector that we pass it.
Let’s start from the conditioning tensor: let’s go for a high bulge fraction, dark matter dominated, small galaxy:
We have successfully created a model that takes as input four numbers, that specify some properties of a galaxy, and it returns as output a realistic rotation curve for that galaxy. The gradual denoising pipeline that we used here is great since it allows us to simply try with another initial random tensor and we will get out a different rotation curve, but with the same features that we specified. This is useful to generate multiple instances of the same galaxy type.
In the Figure above we can se the temporal sequence of gradual denoising conditioned by the four input parameters in 30 steps. The generated rotation curve in the final step (orange curve, rightmost plot) is compared to circular velocity of the bulge component (red dotted) and dark matter component (black dotted) of a galaxy model with parameters close to those used for the curve generation: \(M_{\rm bul}/M_\star = 0.75\), \(\log(M_\star/M_{\rm halo})=-1.1\), \(\log(R_{\rm disc}/R_{\rm bul})=0.8\), \(\log M_\star = 10\).
Naturally, we can readily generate new samples of all sorts of galaxy rotation curve types.
As you can see, multiple instances of virtually any kind of rotation curve can be now generated by the model by gradually denoising initial Gaussian noise.
Final remarks
The rotation curves that I used here are a 6-parameters family, thus I kept the number of conditioning features smaller than this (i.e. 4) just to make sure that the model would not trivially learn to map the full rotation curve tensor into the initial 6 parameters. In other words, had we given the model 6 conditioning features, it would have probably mapped with an identity the initial 6 parameters into the 6-dimensional latent space, and then the decoder would have learned the analytic forms of the curves from these 6 parameters. Hence, by having a 4 conditioning numbers I am sure that there are an infinite number of curves that can be generated by the model and associated with the conditioning input.
Naturally, this exercise could have been solved more easily without using this denoising scheme. However, it is useful to do it in this way since this mirrors very well all the principles that are used in state-of-the-art diffusion models.