Rotation curve decompositions with Gaussian Processes: taking into account data correlations leads to unbiased results
Correlations between velocity measurements in disk galaxy rotation curves are usually neglected when fitting dynamical models. This notebook, which accompanies the paper Posti (2022), Res. Notes AAS, 6, 233, shows how data correlations can be taken into account in rotation curve decompositions using Gaussian Processes.
If you use parts of the code found in this notebook please cite the paper Posti (2022), Res. Notes AAS, 6, 233.
Copyright (c) 2022, Lorenzo Posti. The code is distributed under BSD-style license and it can be copied and used.
Galaxy rotation curves are usually modelled by assuming that each datapoint in the curve is independent from the others. However, this is naturally just a first order approximation, since observational effects such due to geometrical projection and resolution, as well as physical effects such as non-circular motions, can make the velocities measured in two adjacent annnuli significantly correlated.
In this notebook I use the rotation curve of NGC 2403 as a test case to show how to include Gaussian Processes (GPs) in rotation curve decomposition models, in order to account for underlying data correlations. More details can be found in the accompanying paper Posti (2022), Res. Notes AAS, 6, 233.
import numpy as np
import matplotlib
import matplotlib.pylab as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
import random
import jax
import jax.numpy as jnp
import jaxopt
from functools import partial
from tinygp import GaussianProcess, kernels
import numpyro
import arviz
import corner
jax.config.update("jax_enable_x64", True)
import warnings
warnings.filterwarnings('ignore')
from matplotlib import rc
rc('text', usetex=True)
%config Completer.use_jedi = False
%matplotlib inline
Here I introduce the data for the galaxy NGC 2403 and the functions needed to work with Eq. (1) in the paper.
Note that the code below works also for any other galaxy whose data are formatted in the same way, e.g. it works with no modifications needed for all galaxies in the SPARC catalog (Lelli et al. 2016).
I start with some definitions to specify $V_{\rm DM}(R)$, the contribution of DM to the circular velocity in Eq. (1). I assume NFW profiles for the DM halos, which are specified by two parameters: halo mass $M_{\rm h}$ and concentration $c$. $M_{\rm h}$ is the virial mass defined with a critical overdensity of $\Delta_{\rm c}=200$.
G = 4.301e-9 # gravitational constant, in Mpc km^2 s^-2 Msun^-1
H = 70. # Hubble's constant, in km s^-1 Mpc^-1
Dc= 200. # critical overdensity
Note that below I use jax
, and not numpy
, to define these functions. This is needed in order to do model inference with numpyro
.
def jax_fc(x): return jnp.log(1+x)-x/(1+x)
# definitions of virial velocity and virial radius
def jax_Vvir(Mh): return jnp.sqrt((Dc*(H)**2/2)**(1./3.) * (G*Mh)**(2./3.))
def jax_Rvir(Mh):
rho_hat = 4. / 3. * np.pi * Dc * (3. * (H)**2 / (8. * np.pi * G))
return 1e3 * ((Mh / rho_hat)**(1./3.))
# V_DM(R) for an NFW halo
def jax_vhalo(params, R):
Mh, cc = 10**params['log_mh'], 10**params['log_c']
rv = jax_Rvir(Mh)
return jnp.sqrt(jax_Vvir(Mh)**2*rv/R*jax_fc(cc*R/rv)/jax_fc(cc))
Finally, below I implement the model rotation curve in Eq. (1). Two things are worth pointing out:
- I use a linear interpolation for the baryonic part of the curve, since I assume that $V_{\rm gas}$ and $V_\star$ are measured at fixed radii, so that
params['r']
,params['vg']
etc. are expected to be arrays of the same size. - I decomposed $V_\star$ in the bulge and disk components, with two different mass-to-light ratios. While my test case NGC 2403 has no bulge, it is good to include it here for the sake of generality. The SPARC catalog indeed includes stellar circular velocities decomposed into bulge and disk.
def jax_vmod(params, R):
return jnp.sqrt(jax_vhalo(params, R)**2 + # DM
jnp.interp(R, params['r'], params['vg']**2 + # gas
10**params['log_mld']*params['vd']**2+10**params['log_mlb']*params['vb']**2 # stars
))
I take the rotation curve data of NGC 2403 from the SPARC catalog at face value. I include a copy of the file here for convenience.
r, vobs, e_vobs, vg, vd, vb, _, _ = np.genfromtxt('data/NGC2403_rotmod.dat', unpack=True)
Let's plot the data below together with the best-fit model obtained by Posti et al. (2019) for reference.
params = {
# parameters of the best-fit in Posti et al. (2019)
'log_mh' : 11.4,
'log_c' : 1.14,
'log_mld': -0.377,
'log_mlb': -99., # this galaxy has no bulge
# data arrrays
'r' : r,
'vg' : vg,
'vd' : vd,
'vb' : vb,
}
fig,ax = plt.subplots(figsize=(8,5))
ax.errorbar(r, vobs, yerr=e_vobs, fmt='.', c='k', lw=0.5, label=r'$\rm data$')
ax.plot(r, vg, ':', c='tab:cyan', label=r'$\rm gas$')
ax.plot(r, np.sqrt(10**params['log_mld']*vd**2+10**params['log_mlb']*vb**2), '--', c='tab:olive', label=r'$\rm stars$')
ax.plot(r, jax_vhalo(params, r), '-.', c='tab:purple', label=r'$\rm DM$')
ax.plot(r, jax_vmod(params, r), c='grey', lw=2, label=r'$\rm fit$')
ax.set_xlabel(r"$\rm radius/kpc$", fontsize=18)
ax.set_ylabel(r"$\rm velocity/km\,s^{-1}$", fontsize=18)
ax.set_title(r"$\rm NGC 2403$", fontsize=20);
ax.legend(loc='lower right', fontsize=14);
ax.tick_params(labelsize=14);
Let's now get to the modelling side of things. I set up two models here. The first one is analogous to Posti et al. (2019), as well as many other works in this context (e.g. Katz et al. 2017, Li et al. 2020, Mancera-Pina et al. 2022, di Teodoro et al. 2022), it has a $\chi^2$ likelihood and it implicitly assumes that the rotation curve datapoints are independent. The second one generalizes this model by using GPs to take into account data correlations. I recommend the recent review by Aigrain & Foreman-Mackey (2022), in particular their first section, as an introduction to GPs.
I use the library tinygp
to set up my GP regression problem and I use numpyro
to sample the posterior distribution. In particular, I use and Hamiltonian Monte Carlo sampler called No U-Turn Sampler (NUTS).
Let's generate GPs with an Exp-Squared
kernel with two parameters, an amplitude $A_k$ and a scale $s_k$, i.e.
$$
k(R_i, R_j) = A_k \exp\left[-\frac{1}{2}\left(\frac{|R_i-R_j|}{s_k}\right)^2\right]
$$
This kernel is said to be stationary because it depends only on the distance between two points $|R_i-R_j|$.
def build_gp(params, x, yerr):
kernel = 10**params['log_amp']*kernels.ExpSquared(10**params['log_scl'], distance=kernels.distance.L1Distance())
return GaussianProcess(kernel,
x,
diag=yerr**2,
mean=partial(jax_vmod, params)
)
I now set up the model's posterior distribution to be sampled by numpyro
, thus it is a function containing pyro
primitives.
The model starts by defining the priors on the physical parameters ($\theta_V$ in the paper). I use:
- uninformative prior on $\log\,M_{\rm h}$
- Gaussian on $\log\,c$, with mean and width following the $c-M_{\rm h}$ relation found in cosmological simulations (Dutton & Maccio' 2014)
- Gaussian on $\log\,(M/L)_{\rm D}$, centred on $(M/L)_{\rm D}=0.5$ and with standard deviation of 0.2 dex (compatible with stellar population synthesis models, e.g. Lelli et al. 2016)
- Gaussian on $\log\,(M/L)_{\rm B}$, centred on $(M/L)_{\rm B}=0.7$ and with standard deviation of 0.2 dex (again, see Lelli et al. 2016). Note that this is not used in the case of NGC 2403
After the definition of the priors the function branches out: one branch with GPs and one without. I borrowed this structure from the transit example of Fig. 3 in Aigrain & Foreman-Mackey (2022).
The branch with GP also implement additional priors for the two parameters of the kernel ($\theta_k$ in the paper), i.e. the amplitude and scale.
def model(t, y_err, y, params, use_gp=False):
# priors
params["log_mh"]=numpyro.sample("log_mh",numpyro.distributions.Uniform(8.0, 14.0))
params["log_c"] =numpyro.sample('log_c',numpyro.distributions.Normal(0.905-0.101*(params['log_mh']*0.7-12),0.11))
params["log_mld"]=numpyro.sample('log_mld',numpyro.distributions.Normal(-0.3, 0.2))
params["log_mlb"]=numpyro.sample('log_mlb',numpyro.distributions.Normal(-0.15, 0.2))
if use_gp:
###################
# branch WITH GPs #
###################
# define kernel parameters
params["log_amp"] = numpyro.sample("log_amp", numpyro.distributions.Uniform(-4.0, 5.0))
params["log_scl"] = numpyro.sample("log_scl", numpyro.distributions.Uniform(-2.0, 3.0))
# generate the GP
gp = build_gp(params, t, y_err)
# sample the posterior
numpyro.sample("y", gp.numpyro_dist(), obs=y)
# calculate the predicted V_rot (i.e. the mean function) of the model
mu = gp.mean_function(params["r_grid"])
numpyro.deterministic("mu", mu)
else:
######################
# branch WITHOUT GPs #
######################
# sample the posterior
numpyro.sample("y", numpyro.distributions.Normal(jax_vmod(params, t), y_err), obs=y)
# calculate properties of the model
numpyro.deterministic("mu", jax_vmod(params, params["r_grid"]))
I start by sampling the posterior of the model akin to that of Posti et al. (2019), i.e. assuming that the points in the curve are independent.
I'm using arviz
to analyse the posterior sampled by NUTS. In particular, keep an eye on r_hat
which is the Gelman-Rubin statistics: for our purposes, we can use ${\rm r_{hat}}\simeq 1$ as an indicator that the marginalized posterior on a particular parameter is well determined.
grid_size = 1000
r_grid = jnp.linspace(r.min(), r.max(), grid_size) # radial grid on which to predict V_rot(R)
params = {"vg" : vg,
"vd" : vd,
"vb" : vb,
"r" : r,
"r_grid": r_grid}
num_warmup=1000
num_samples=3000
num_chains=3
accept_prob = 0.9
sampler_wn = numpyro.infer.MCMC(
numpyro.infer.NUTS(
model,
dense_mass=True,
target_accept_prob=accept_prob,
),
num_warmup=num_warmup,
num_samples=num_samples,
num_chains=num_chains,
progress_bar=True,
)
%time sampler_wn.run(jax.random.PRNGKey(11), r, e_vobs, vobs, params)
inf_data_wn = arviz.from_numpyro(sampler_wn)
arviz.summary(inf_data_wn, var_names=["log_mh", "log_c", "log_mld"])
Let's plot the chains to make sure each parameter is converged
azLabeller = arviz.labels.MapLabeller(var_name_map={"log_mh" : r"$\log\,M_{\rm h}$",
"log_c" : r"$\log\,c$",
"log_mld": r"$\log\,M/L$",
"log_amp": r"$A_k$",
"log_scl": r"$s_k$"})
arviz.plot_trace(inf_data_wn, var_names=["log_mh", "log_c", "log_mld"], figsize=(12,9),
labeller = azLabeller);
Now, let's have a look at what happens when GPs come into play. I run exactly the same procedure as before to sample the model's posterior, but this time I select the branch with use_gp=True
. Since now I have two more free parameters, the scale and amplitude of the kernel, I also initialize these two in init_strategy
.
sampler = numpyro.infer.MCMC(
numpyro.infer.NUTS(
model,
dense_mass=True,
target_accept_prob=accept_prob,
),
num_warmup=num_warmup,
num_samples=num_samples,
num_chains=num_chains,
progress_bar=True,
)
%time sampler.run(jax.random.PRNGKey(11), r, e_vobs, vobs, params, use_gp=True)
inf_data = arviz.from_numpyro(sampler)
arviz.summary(inf_data, var_names=["log_mh", "log_c", "log_mld", "log_amp", "log_scl"])
And let's plot again the chains to make sure that every parameter has converged
arviz.plot_trace(inf_data, var_names=["log_mh", "log_c", "log_mld", "log_amp", "log_scl"], figsize=(12,15),
labeller = azLabeller);
Everything looks good for both models.
Finally, I can compare the posterior distributions of the two models. I use corner to plot 1-D and 2-D projections of the posteriors.
ranges = [(11.2, 11.8), (0.8, 1.35), (-0.8,-0.0)] # NGC 2403
fig = corner.corner(inf_data_wn, bins=40, range=ranges,
color="C0", var_names=["log_mh", "log_c", "log_mld"], smooth=1.0, smooth1d=1.0
)
fig = corner.corner(inf_data, bins=40, range=ranges,
color="C1", var_names=["log_mh", "log_c", "log_mld"], smooth=1.0, smooth1d=1.0,
labels=["$\log\,M_h$", "$\log\,c$", r"$\log\,M/L$"],
label_kwargs={"fontsize":20}, fig=fig)
# make legend
ax = fig.axes[1]
key_nn = matplotlib.lines.Line2D([], [], color='C0', linestyle='-',
label=r'$\rm Model\,\,{\bf without\,\,GPs}:\,\,independent\,\,data$')
key_gp = matplotlib.lines.Line2D([], [], color='C1', linestyle='-',
label=r'$\rm Model\,\,{\bf with\,\,GPs}:\,\,correlated\,\,data$')
ax.legend(loc='upper left', handles=[key_nn, key_gp], fontsize=16);
Then I can compare the predicted curve decompositions of the two models. Here I get the predictions of the model, excluding the warmup phase during sampling.
final_shape = (num_chains*(num_samples-num_warmup),grid_size) # shape of the predictions array after removing warmup
pred_wn = sampler_wn.get_samples(group_by_chain=True)['mu'][:,num_warmup:,:].reshape(final_shape)
pred = sampler.get_samples(group_by_chain=True)['mu'][:,num_warmup:,:].reshape(final_shape)
fig,ax = plt.subplots(figsize=(7,8), nrows=2, gridspec_kw={'hspace':0})
def commons(ax, i, lmh, lc, lmld, lmlb, leg=True):
# rotation curve
ax[i].errorbar(r, vobs, yerr=e_vobs, fmt='.', c='k', lw=0.5, label=r'$\rm data$', markersize=4)
ax[i].plot(r, vg, ':', c='tab:cyan', label=r'$\rm gas$')
ax[i].plot(r, np.sqrt(10**lmld*vd**2+10**lmlb*vb**2), '--', c='tab:olive', label=r'$\rm stars$')
ax[i].plot(r, jax_vhalo({'log_mh':lmh, 'log_c':lc}, r), '-.', c='tab:purple', label=r'$\rm DM$')
if leg: ax[i].legend(loc='lower right', fontsize=14)
ax[i].set_xlabel(r"$\rm radius/kpc$", fontsize=18)
ax[i].set_ylabel(r"$\rm velocity/km\,s^{-1}$", fontsize=18)
ax[i].tick_params(direction='inout', top=True, labelsize=14)
ax[0].fill_between(r_grid, np.percentile(pred_wn,2.1,axis=0), np.percentile(pred_wn,97.9,axis=0),
facecolor='C0', alpha=0.3)
ax[1].fill_between(r_grid, np.percentile(pred,2.1,axis=0), np.percentile(pred,97.9,axis=0),
facecolor='C1', alpha=0.3)
ax[0].plot(r_grid, np.median(pred_wn, axis=0), 'C0')
ax[1].plot(r_grid, np.median(pred, axis=0), 'C1')
commons(ax, 0, 11.325, 1.234, -0.542, -99.)
commons(ax, 1, 11.491, 1.079, -0.326, -99., leg=False)
Given that the parameters of the kernel $A_k$ and $s_k$ are well constrained by the model, we can have a look at the correlation matrix of the model with GPs.
%%time
def get_kmat(params, x, yerr,):
gp = build_gp(params, x, yerr)
xm1, xm2 = jnp.meshgrid(x,x)
zm = np.zeros_like(xm1.flatten())
for i in range(len(xm1.flatten())):
zm[i]=gp.kernel.evaluate(xm1.flatten()[i], xm2.flatten()[i])
return zm.reshape((len(x), len(x)))
zm = get_kmat({"log_amp":1.104, "log_scl":-0.018, 'log_mh':11.325, 'log_c':1.234,
'log_mld':-0.542, 'log_mlb':-99., 'vg':vg, 'vd':vd, 'vb':vb, 'r':r}, r, e_vobs)
def plt_mat(ax, zm):
im=ax.matshow(zm)
ax.set_xlabel("$i$", fontsize=20)
ax.set_ylabel("$j$", fontsize=20)
ax.tick_params(labelsize=14)
ax.xaxis.set_label_position('top')
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.05)
cb = plt.colorbar(im, cax=cax)
cb.set_label(r"$k(R_i, R_j)$", fontsize=22)
cb.ax.tick_params(labelsize=16)
fig,ax = plt.subplots()
plt_mat(ax, zm)
The peculiar 2-block shape of this matrix is due to the combined H$\alpha$-HI nature of the rotation curve. The optical H$\alpha$ curve samples the inner regions of the galaxy, up to $\sim 5$ kpc, with a finer spatial sampling of 0.1 kpc with respect to the HI rotation curve, which instead has a spacing of 0.5 kpc and is from 5 kpc outwards.