Converting emcee objects to DataTree#

DataTree is the data format ArviZ relies on.

This page covers multiple ways to generate a DataTree from emcee objects.

See also

We will start by importing the required packages and defining the model. The famous 8 school model.

import arviz_base as az
import numpy as np
import emcee
import xarray as xr

xr.set_options(display_expand_attrs=False);
J = 8
y_obs = np.array([28.0, 8.0, -3.0, 7.0, -1.0, 1.0, 18.0, 12.0])
sigma = np.array([15.0, 10.0, 16.0, 11.0, 9.0, 11.0, 10.0, 18.0])
def log_prior_8school(theta):
    mu, tau, eta = theta[0], theta[1], theta[2:]
    # Half-cauchy prior, hwhm=25
    if tau < 0:
        return -np.inf
    prior_tau = -np.log(tau**2 + 25**2)
    prior_mu = -((mu / 10) ** 2)  # normal prior, loc=0, scale=10
    prior_eta = -np.sum(eta**2)  # normal prior, loc=0, scale=1
    return prior_mu + prior_tau + prior_eta


def log_likelihood_8school(theta, y, s):
    mu, tau, eta = theta[0], theta[1], theta[2:]
    return -(((mu + tau * eta - y) / s) ** 2)


def lnprob_8school(theta, y, s):
    prior = log_prior_8school(theta)
    like_vect = log_likelihood_8school(theta, y, s)
    like = np.sum(like_vect)
    return like + prior
nwalkers = 40  # called chains in ArviZ
ndim = J + 2
draws = 1500
pos = np.random.normal(size=(nwalkers, ndim))
pos[:, 1] = np.absolute(pos[:, 1])
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob_8school, args=(y_obs, sigma))
sampler.run_mcmc(pos, draws);

Manually set variable names#

This first example will show how to convert manually setting the variable names only, leaving everything else to ArviZ defaults.

# define variable names, it cannot be inferred from emcee
var_names = ["mu", "tau"] + ["eta{}".format(i) for i in range(J)]
idata1 = az.from_emcee(sampler, var_names=var_names)
idata1
<xarray.DataTree>
Group: /
├── Group: /posterior
│       Dimensions:  (draw: 1500, chain: 40)
│       Coordinates:
│         * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
│         * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
│       Data variables:
│           mu       (draw, chain) float64 480kB 0.8202 0.211 -1.375 ... 4.206 8.141
│           tau      (draw, chain) float64 480kB 1.196 0.7998 0.7844 ... 10.15 15.55
│           eta0     (draw, chain) float64 480kB -1.047 2.258 0.6618 ... 0.591 0.8605
│           eta1     (draw, chain) float64 480kB 0.4484 1.004 0.7797 ... -0.1798 0.2171
│           eta2     (draw, chain) float64 480kB -0.5649 0.9961 ... -0.6349 0.4915
│           eta3     (draw, chain) float64 480kB 0.003121 -0.4402 ... -0.4241 0.5467
│           eta4     (draw, chain) float64 480kB -1.396 -0.8508 ... -0.168 0.06493
│           eta5     (draw, chain) float64 480kB -0.6115 0.4832 ... 0.7364 -0.01663
│           eta6     (draw, chain) float64 480kB -0.9003 0.5429 -0.6938 ... 0.175 0.5477
│           eta7     (draw, chain) float64 480kB -0.02164 1.641 ... -0.6215 0.5602
│       Attributes: (6)
├── Group: /observed_data
│       Dimensions:      (arg_0_dim_0: 8, arg_1_dim_0: 8)
│       Coordinates:
│         * arg_0_dim_0  (arg_0_dim_0) int64 64B 0 1 2 3 4 5 6 7
│         * arg_1_dim_0  (arg_1_dim_0) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           arg_0        (arg_0_dim_0) float64 64B 28.0 8.0 -3.0 7.0 -1.0 1.0 18.0 12.0
│           arg_1        (arg_1_dim_0) float64 64B 15.0 10.0 16.0 ... 11.0 10.0 18.0
│       Attributes: (6)
└── Group: /sample_stats
        Dimensions:  (draw: 1500, chain: 40)
        Coordinates:
          * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
          * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
        Data variables:
            lp       (draw, chain) float64 480kB -19.31 -25.0 -19.5 ... -14.01 -13.58
        Attributes: (6)

ArviZ has stored the posterior variables with the provided names as expected, but it has also included other useful information in the InferenceData object. The log probability of each sample is stored in the sample_stats group under the name lp and all the arguments passed to the sampler as args have been saved in the observed_data group.

It can also be useful to perform a burn in cut to the MCMC samples (see arviz.InferenceData.sel for more details)

idata1.sel(draw=slice(100, None))
<xarray.DataTree>
Group: /
├── Group: /posterior
│       Dimensions:  (draw: 1400, chain: 40)
│       Coordinates:
│         * draw     (draw) int64 11kB 100 101 102 103 104 ... 1495 1496 1497 1498 1499
│         * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
│       Data variables:
│           mu       (draw, chain) float64 448kB 3.829 8.432 5.303 ... 9.19 4.206 8.141
│           tau      (draw, chain) float64 448kB 5.871 7.587 7.144 ... 8.75 10.15 15.55
│           eta0     (draw, chain) float64 448kB 0.3892 0.6282 1.529 ... 0.591 0.8605
│           eta1     (draw, chain) float64 448kB -0.5472 0.3282 ... -0.1798 0.2171
│           eta2     (draw, chain) float64 448kB -0.661 -1.35 0.04889 ... -0.6349 0.4915
│           eta3     (draw, chain) float64 448kB 0.1321 -0.3383 ... -0.4241 0.5467
│           eta4     (draw, chain) float64 448kB -1.078 0.198 -0.2614 ... -0.168 0.06493
│           eta5     (draw, chain) float64 448kB 0.4592 -0.9649 ... 0.7364 -0.01663
│           eta6     (draw, chain) float64 448kB 0.9126 0.7851 1.251 ... 0.175 0.5477
│           eta7     (draw, chain) float64 448kB -0.1107 0.672 1.494 ... -0.6215 0.5602
│       Attributes: (6)
├── Group: /observed_data
│       Dimensions:      (arg_0_dim_0: 8, arg_1_dim_0: 8)
│       Coordinates:
│         * arg_0_dim_0  (arg_0_dim_0) int64 64B 0 1 2 3 4 5 6 7
│         * arg_1_dim_0  (arg_1_dim_0) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           arg_0        (arg_0_dim_0) float64 64B 28.0 8.0 -3.0 7.0 -1.0 1.0 18.0 12.0
│           arg_1        (arg_1_dim_0) float64 64B 15.0 10.0 16.0 ... 11.0 10.0 18.0
│       Attributes: (6)
└── Group: /sample_stats
        Dimensions:  (draw: 1400, chain: 40)
        Coordinates:
          * draw     (draw) int64 11kB 100 101 102 103 104 ... 1495 1496 1497 1498 1499
          * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
        Data variables:
            lp       (draw, chain) float64 448kB -13.81 -14.4 -17.26 ... -14.01 -13.58
        Attributes: (6)

Structuring the posterior as multidimensional variables#

This way of calling from_emcee stores each eta as a different variable, called eta#, however, they are in fact different dimensions of the same variable. This can be seen in the code of the likelihood and prior functions, where theta is unpacked as:

mu, tau, eta = theta[0], theta[1], theta[2:]

ArviZ has support for multidimensional variables, and there is a way to tell it how to split the variables like it was done in the likelihood and prior functions. Defining slices for multidimensional variables is compatible with the var_names argument we used in the previous example:

idata2 = az.from_emcee(sampler, slices=[0, 1, slice(2, None)], var_names=["mu", "tau", "eta"])
idata2
<xarray.DataTree>
Group: /
├── Group: /posterior
│       Dimensions:    (draw: 1500, chain: 40, eta_dim_0: 8)
│       Coordinates:
│         * draw       (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
│         * chain      (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 32 33 34 35 36 37 38 39
│         * eta_dim_0  (eta_dim_0) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           mu         (draw, chain) float64 480kB 0.8202 0.211 -1.375 ... 4.206 8.141
│           tau        (draw, chain) float64 480kB 1.196 0.7998 0.7844 ... 10.15 15.55
│           eta        (draw, chain, eta_dim_0) float64 4MB -1.047 0.4484 ... 0.5602
│       Attributes: (6)
├── Group: /observed_data
│       Dimensions:      (arg_0_dim_0: 8, arg_1_dim_0: 8)
│       Coordinates:
│         * arg_0_dim_0  (arg_0_dim_0) int64 64B 0 1 2 3 4 5 6 7
│         * arg_1_dim_0  (arg_1_dim_0) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           arg_0        (arg_0_dim_0) float64 64B 28.0 8.0 -3.0 7.0 -1.0 1.0 18.0 12.0
│           arg_1        (arg_1_dim_0) float64 64B 15.0 10.0 16.0 ... 11.0 10.0 18.0
│       Attributes: (6)
└── Group: /sample_stats
        Dimensions:  (draw: 1500, chain: 40)
        Coordinates:
          * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
          * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
        Data variables:
            lp       (draw, chain) float64 480kB -19.31 -25.0 -19.5 ... -14.01 -13.58
        Attributes: (6)

blobs: unlock sample stats, posterior predictive and miscellanea#

Emcee does not store per-draw sample stats, however, it has a functionality called blobs that allows to store any variable on a per-draw basis. It can be used to store some sample_stats or even posterior_predictive data.

You can modify the probability function to use this blobs functionality and store the pointwise log likelihood, then rerun the sampler using the new function:

def lnprob_8school_blobs(theta, y, s):
    prior = log_prior_8school(theta)
    like_vect = log_likelihood_8school(theta, y, s)
    like = np.sum(like_vect)
    return like + prior, like_vect


sampler_blobs = emcee.EnsembleSampler(
    nwalkers,
    ndim,
    lnprob_8school_blobs,
    args=(y_obs, sigma),
)
sampler_blobs.run_mcmc(pos, draws);

You can now use the blob_names argument to indicate how to store this blob-defined variable. As the group is not specified, it will go to sample_stats. Note that the argument blob_names is added to the arguments covered in the previous examples and we are also introducing coords and dims arguments to show the power and flexibility of the converter.

dims = {"eta": ["school"], "log_likelihood": ["school"]}
idata3 = az.from_emcee(
    sampler_blobs,
    var_names=["mu", "tau", "eta"],
    slices=[0, 1, slice(2, None)],
    blob_names=["y"],
    dims=dims,
    coords={"school": range(8)},
)
idata3
<xarray.DataTree>
Group: /
├── Group: /posterior
│       Dimensions:  (draw: 1500, chain: 40, school: 8)
│       Coordinates:
│         * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
│         * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
│         * school   (school) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           mu       (draw, chain) float64 480kB 0.8202 0.211 -1.375 ... 4.206 8.141
│           tau      (draw, chain) float64 480kB 1.196 0.7998 0.7844 ... 10.15 15.55
│           eta      (draw, chain, school) float64 4MB -1.047 0.4484 ... 0.5477 0.5602
│       Attributes: (6)
├── Group: /observed_data
│       Dimensions:      (arg_0_dim_0: 8, arg_1_dim_0: 8)
│       Coordinates:
│         * arg_0_dim_0  (arg_0_dim_0) int64 64B 0 1 2 3 4 5 6 7
│         * arg_1_dim_0  (arg_1_dim_0) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           arg_0        (arg_0_dim_0) float64 64B 28.0 8.0 -3.0 7.0 -1.0 1.0 18.0 12.0
│           arg_1        (arg_1_dim_0) float64 64B 15.0 10.0 16.0 ... 11.0 10.0 18.0
│       Attributes: (6)
├── Group: /log_likelihood
│       Dimensions:  (draw: 1500, chain: 40, y_dim_0: 8)
│       Coordinates:
│         * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
│         * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
│         * y_dim_0  (y_dim_0) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           y        (draw, chain, y_dim_0) float64 4MB -3.593 -0.4414 ... -0.07264
│       Attributes: (6)
└── Group: /sample_stats
        Dimensions:  (draw: 1500, chain: 40)
        Coordinates:
          * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
          * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
        Data variables:
            lp       (draw, chain) float64 480kB -19.31 -25.0 -19.5 ... -14.01 -13.58
        Attributes: (6)

Multi-group blobs#

You might even have more complicated blobs, each corresponding to a different group of the InferenceData object. Moreover, you can store the variables passed to the EnsembleSampler via the args argument in observed or constant data groups. This is shown in the example below:

sampler_blobs.blobs[0, 1]
array([-3.00054720e+00, -4.88089332e-01, -6.27415297e-02, -4.21450785e-01,
       -3.47373734e-03, -1.33900316e-03, -3.01188811e+00, -3.38774354e-01])
def lnprob_8school_blobs(theta, y, sigma):
    mu, tau, eta = theta[0], theta[1], theta[2:]
    prior = log_prior_8school(theta)
    like_vect = log_likelihood_8school(theta, y, sigma)
    like = np.sum(like_vect)
    # store pointwise log likelihood, useful for model comparison with az.loo or az.waic
    # and posterior predictive samples as blobs
    return like + prior, (like_vect, np.random.normal((mu + tau * eta), sigma))


sampler_blobs = emcee.EnsembleSampler(
    nwalkers,
    ndim,
    lnprob_8school_blobs,
    args=(y_obs, sigma),
)
sampler_blobs.run_mcmc(pos, draws)

dims = {"eta": ["school"], "log_likelihood": ["school"], "y": ["school"]}
idata4 = az.from_emcee(
    sampler_blobs,
    var_names=["mu", "tau", "eta"],
    slices=[0, 1, slice(2, None)],
    arg_names=["y", "sigma"],
    arg_groups=["observed_data", "constant_data"],
    blob_names=["y", "y"],
    blob_groups=["log_likelihood", "posterior_predictive"],
    dims=dims,
    coords={"school": range(8)},
)
idata4
<xarray.DataTree>
Group: /
├── Group: /posterior
│       Dimensions:  (draw: 1500, chain: 40, school: 8)
│       Coordinates:
│         * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
│         * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
│         * school   (school) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           mu       (draw, chain) float64 480kB 0.8202 0.211 -1.375 ... 4.206 8.141
│           tau      (draw, chain) float64 480kB 1.196 0.7998 0.7844 ... 10.15 15.55
│           eta      (draw, chain, school) float64 4MB -1.047 0.4484 ... 0.5477 0.5602
│       Attributes: (6)
├── Group: /observed_data
│       Dimensions:  (school: 8)
│       Coordinates:
│         * school   (school) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           y        (school) float64 64B 28.0 8.0 -3.0 7.0 -1.0 1.0 18.0 12.0
│       Attributes: (6)
├── Group: /constant_data
│       Dimensions:      (sigma_dim_0: 8)
│       Coordinates:
│         * sigma_dim_0  (sigma_dim_0) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           sigma        (sigma_dim_0) float64 64B 15.0 10.0 16.0 ... 11.0 10.0 18.0
│       Attributes: (6)
├── Group: /log_likelihood
│       Dimensions:  (draw: 1500, chain: 40, school: 8)
│       Coordinates:
│         * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
│         * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
│         * school   (school) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           y        (draw, chain, school) float64 4MB -3.593 -0.4414 ... -0.07264
│       Attributes: (6)
├── Group: /posterior_predictive
│       Dimensions:  (draw: 1500, chain: 40, school: 8)
│       Coordinates:
│         * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
│         * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
│         * school   (school) int64 64B 0 1 2 3 4 5 6 7
│       Data variables:
│           y        (draw, chain, school) float64 4MB -10.08 7.933 ... 18.78 39.08
│       Attributes: (6)
└── Group: /sample_stats
        Dimensions:  (draw: 1500, chain: 40)
        Coordinates:
          * draw     (draw) int64 12kB 0 1 2 3 4 5 6 ... 1494 1495 1496 1497 1498 1499
          * chain    (chain) int64 320B 0 1 2 3 4 5 6 7 8 ... 31 32 33 34 35 36 37 38 39
        Data variables:
            lp       (draw, chain) float64 480kB -19.31 -25.0 -19.5 ... -14.01 -13.58
        Attributes: (6)

This last version, which contains both observed data and posterior predictive could be used to plot posterior predictive checks with plot_ppc_dist

%load_ext watermark
%watermark -n -u -v -iv -w
Last updated: Sat, 28 Feb 2026

Python implementation: CPython
Python version       : 3.12.12
IPython version      : 9.6.0

arviz_base: 0.9.0.dev0
emcee     : 3.1.6
numpy     : 2.3.4
xarray    : 2025.10.1

Watermark: 2.6.0