Data Structures

Data Structures#

Note

These data structures represent the main input and output data that beorn works with. They include:

the flux profiles that describe radiation fields around sources
the catalogs of halos used to give spatial context to the data
the resulting output grids of radiation and ionization fields (in one or multiple redshift slices)

class beorn.structs.CoevalCube(delta_b: numpy.ndarray, Grid_Temp: numpy.ndarray, Grid_xHII: numpy.ndarray, Grid_xal: numpy.ndarray, z: float, *, _file_path: Path = None, parameters: Parameters = None)[source]#

Grid data for a single redshift snapshot. All grid data properties are implemented as base properties and derived properties in mixin classes. They contain the fundamental grids computed during the painting of the simulation as well as derived quantities computed from them.

to_arrays() → None[source]#

Ensure all fields are plain numpy arrays.

When a CoevalCube is loaded from HDF5 its datasets are h5py.Dataset objects which are not picklable across MPI processes. This helper converts such datasets to numpy arrays in place.

z: float#: Redshift of the snapshot.

class beorn.structs.HaloCatalog(positions: numpy.ndarray, masses: numpy.ndarray, parameters: Parameters, redshift_index: int = 0, redshift: float = None, alphas: numpy.ndarray = None)[source]#

Container for halo positions, masses and associated properties.

Instances represent all halos at a single given redshift and provide filtering and utility methods used by the painting stage.

at_indices(indices) → HaloCatalog[source]#

Return a new HaloCatalog restricted to the given indices.

Parameters:: indices (array-like) – Indices selecting a subset of halos.
Returns:: New instance containing only the selected halos.
Return type:: HaloCatalog

get_halo_indices(alpha_range: list[float], mass_range: list[float]) → numpy.ndarray[source]#

Return indices of halos within provided alpha and mass ranges.

Parameters:

alpha_range (list[float]) – [alpha_min, alpha_max).
mass_range (list[float]) – [mass_min, mass_max).

Returns:

Integer indices of matching halos (may be empty).

Return type:

numpy.ndarray

halo_mass_function(bin_count: int = None) → tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]#

Compute the halo mass function (dn/dlnM) and Poisson errors.

Parameters:: bin_count (int|None) – Number of mass bins. If None a default value proportional to the mass dynamic range is used.
Returns:: (bin_edges, hmf, error) where hmf is in units of (Mpc/h)^-3 and error is the Poisson uncertainty.
Return type:: tuple

to_mesh() → numpy.ndarray[source]#

Rasterize halo positions into a 3D number-count mesh.

The mesh uses the nearest-neighbor mass-assignment scheme. The returned array represents halo counts (number density), not mass.

Returns:: 3D float32 array with shape (Ncell, Ncell, Ncell).
Return type:: numpy.ndarray

alphas: numpy.ndarray = None#: Halo mass accretion rate, calculated from the mass history of the halo. If not available, it is set to a default value of 0.79. Inputs that provide a mass history will override this value to have the same shape as the masses array.

masses: numpy.ndarray#: Halo masses in units of Msun => shape=(N,)

parameters: Parameters#: The parameters of the simulation, which are used to filter the halo catalog.

positions: numpy.ndarray#: Halo positions in 3D space (X, Y, Z coordinates) in units of cMpc => shape=(N, 3)

redshift: float = None#: Actual redshift of this snapshot. Preferred over redshift_index for any quantity that needs z.

redshift_index: int = 0#: The index of the redshift snapshot that this catalog corresponds to. This is used to look up accretion history

property size: int#: Returns the number of halos in the catalog.

class beorn.structs.Lightcone(data: numpy.ndarray, redshifts: numpy.ndarray, parameters: Parameters, quantity: str)[source]#

Class to handle lightcone data generated from a series of simulation snapshots.

data: numpy.ndarray#: The lightcone data array containing the desired quantity (e.g., brightness temperature) with spatial and redshift dimensions.

parameters: Parameters#: The parameters of the simulation.

quantity: str#: The quantity represented in the lightcone (e.g., ‘dTb’ for brightness temperature).

redshifts: numpy.ndarray#: Array of redshifts corresponding to the slices in the lightcone data.

class beorn.structs.Parameters(source: SourceParameters = <factory>, solver: SolverParameters = <factory>, cosmology: CosmologyParameters = <factory>, simulation: SimulationParameters = <factory>, cosmo_sim: CosmoSimParameters = <factory>)[source]#

Group all the parameters for the simulation.

classmethod from_dict(params_dict: dict) → Parameters[source]#: Create a Parameters object from a dictionary. This is useful for loading parameters from a file.

classmethod from_group(group: h5py.Group) → Parameters[source]#: Create a Parameters object from an hdf5 group. This is useful for loading parameters from an hdf5 file.

classmethod from_yaml(yaml_path: Path) → Parameters[source]#: Create a Parameters object from a YAML file.

beorn_hash() → str[source]#

Short MD5 hash of BEoRN-specific parameters (source, solver, simulation).

Cosmology is intentionally excluded — it is already encoded in the input data directory name (e.g. the py21cmfast subdirectory). This hash therefore differentiates astrophysical models applied to the same underlying density/halo data.

cosmo_sim is also excluded: it controls which input data is used but does not affect the underlying physics model — it is already encoded in the input_tag.

profiles_hash() → str[source]#

Short MD5 hash of parameters that affect the 1D radiation profiles.

Covers source parameters, cosmology, solver redshifts, and the halo mass / accretion-rate bins. Intentionally excludes random seed, grid dimensions (Ncell, Lbox, py21cmfast_high_res_factor), and other simulation parameters that do not influence the 1D profile shapes. This allows profiles to be reused when re-running BEoRN with a different py21cmfast seed or a different grid resolution.

summary_str() → str[source]#: Return a concise human-readable summary of the key model parameters.

to_yaml(path: Path, exclude_keys: set[str] | None = None) → None[source]#

Write parameters to a human-readable YAML file at path.

Parameters:

path – Destination file path.
exclude_keys –
Optional set of strings to omit. Two forms are supported:
- "section" — remove the entire top-level section, e.g. {"simulation", "cosmo_sim"}.
- "section.field" — remove a single field within a section, e.g. {"solver.redshifts"}.

unique_hash() → str[source]#: Generates a unique hash for the current set of parameters. This can be used as a unique key when caching the computations.

cosmo_sim: CosmoSimParameters#: cosmo-sim input parameters (py21cmfast, Thesan, PKDGrav, etc.)

cosmology: CosmologyParameters#: cosmological parameters

simulation: SimulationParameters#: simulation parameters

solver: SolverParameters#: solver parameters

source: SourceParameters#: source parameters

class beorn.structs.RadiationProfiles(z_history: numpy.ndarray, halo_mass_bins: numpy.ndarray, rho_xray: numpy.ndarray, rho_heat: numpy.ndarray, rho_alpha: numpy.ndarray, R_bubble: numpy.ndarray, r_lyal: numpy.ndarray, r_grid_cell: numpy.ndarray, *, _file_path: Path = None, parameters: Parameters = None)[source]#

Flux profiles around star forming halos, computed for a range of halo masses and accretion rates (alpha). The central assumption is that each halo with given key properties (mass, alpha) produces the same radiation profile, meaning these profiles can be reused for multiple halos in the painting step of the simulation.

classmethod get_file_path(directory: Path, parameters: Parameters, **kwargs) → Path[source]#

Cache key covers only the parameters that shape the 1D profiles.

Random seed and grid dimensions (Ncell, Lbox, DIM) are excluded so that profiles computed for one py21cmfast realisation are reused when painting a different seed or grid resolution with the same physics.

profiles_of_halo_bin(z_index: int, alpha_index: slice, mass_index: slice) → tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]#

Return the three core profiles for a halo bin.

Parameters:

z_index (int) – Redshift index.
alpha_index (int or slice) – Alpha (accretion) index or slice.
mass_index (int or slice) – Mass bin index or slice.

Returns:

(R_bubble, rho_alpha, rho_heat) arrays copied from the stored profiles for the requested bin.

Return type:

tuple

validate()[source]#

Check all profile arrays for NaN/Inf values.

Called automatically after fresh computation. When loading from disk the data was already validated at write time, so this is skipped to avoid loading the entire (potentially multi-GB) HDF5 dataset into RAM. Call explicitly if you need to re-validate a loaded profile.

R_bubble: numpy.ndarray#: radius of the ionized bubble around the star forming halo

halo_mass_bins: numpy.ndarray#: bin edges of the halo masses that the profiles have been computed for. The radiation profile at index i corresponds to the halo mass range [halo_mass_bins[i], halo_mass_bins[i+1]]

r_grid_cell: numpy.ndarray#: radial grid of the profiles

r_lyal: numpy.ndarray#: radius of the Lyman alpha halo

rho_alpha: numpy.ndarray#: rho alpha profile

rho_heat: numpy.ndarray#: heating profile, derived from the X-ray profile

rho_xray: numpy.ndarray#: X-ray profile

z_history: numpy.ndarray#: redshift range for which the profiles have been computed. Corresponds to the parameters.solver.redshifts parameter

class beorn.structs.StatisticsEstimator(cube: TemporalCube | None = None, parameters: Parameters | None = None)[source]#

Compute and persist summary statistics from a TemporalCube.

Each entry in _estimators names a method on this class. compute() calls them in order and merges their dict returns into a single results dict. Subclasses add new estimators by extending the list and implementing the corresponding method.

results is a lazy property: on first access it tries to load from default_path(); if no file exists there it calls compute().

Parameters:

cube – Multi-redshift simulation output.
parameters – Simulation parameters (needed for k-bins, box size, etc.).

classmethod from_file(path: str | Path, format: str = 'hdf5') → StatisticsEstimator[source]#

Load a previously saved statistics file without a TemporalCube.

The returned instance has cube=None and its results are pre-populated from disk. It can be passed directly to the draw_* plotting functions.

Parameters:

path – File written by save().
format – 'hdf5' or 'pickle'. Inferred from extension when omitted.

Returns:

Instance with results loaded from path.

Return type:

StatisticsEstimator

Example:

stats = MyStats.from_file('output/igm_stats_py21cmfast_N128_….h5')
beorn.plotting.draw_dTb_signal(ax, stats, color='k')

static load(path: str | Path, format: str = 'hdf5') → dict[source]#

Load previously saved statistics from path.

The format is inferred from the file extension when format is not given explicitly: .h5 / .hdf5 → 'hdf5'; .pkl / .pickle → 'pickle'.

Parameters:

path – File written by save().
format – 'hdf5' or 'pickle'. Inferred from extension when omitted.

Returns:

dict mapping statistic names to numpy arrays.

compute() → dict[source]#: Run all registered estimators and return a merged results dict.

default_path(format: str = 'hdf5') → Path | None[source]#

Return the default statistics file path next to the cube directory.

The name is derived from the cube directory by replacing the igm_data_ prefix with igm_stats_ and appending .h5 or .pkl. Returns None when the cube has no _file_path.

Example

cube dir → output/igm_data_py21cmfast_N128_…_4024dbb5 stats file → output/igm_stats_py21cmfast_N128_…_4024dbb5.h5

global_signals() → dict[source]#

Global (volume-averaged) signal histories.

Returns:: dict with keys z, mean_dTb, mean_Temp, mean_xHII, mean_xal.

power_spectra() → dict[source]#

Dimensionless 21-cm power spectrum \(\Delta^2_{21}(k, z)\).

Returns:: dict with keys k (shape (n_k,)) and ps_dTb (shape (n_z, n_k)).

save(path: str | Path | None = None, format: str = 'hdf5') → Path[source]#

Compute statistics and write them to path.

If path is omitted, default_path() is used. The computed results are also cached in _results so a subsequent call to results does not re-compute.

Parameters:

path – Output file path, or None to use the default.
format – 'hdf5' (default) or 'pickle'.

Returns:

The file that was written.

Return type:

Path

Raises:

ValueError – If path is None and the cube has no _file_path.

property results: dict#: Cached statistics dict. Loads from default_path() if the file exists; otherwise calls compute() and caches the result.

class beorn.structs.TemporalCube(delta_b: numpy.ndarray, Grid_Temp: numpy.ndarray, Grid_xHII: numpy.ndarray, Grid_xal: numpy.ndarray, z_snapshots: numpy.ndarray = None, *, _file_path: Path = None, parameters: Parameters = None)[source]#

Collection of grid data over multiple redshifts.

Unlike other BaseStruct subclasses, a TemporalCube is stored as a directory of per-redshift HDF5 files rather than a single monolithic file:

igm_data_{input_tag}_{beorn_hash}/
    CoevalCube_z7.000.h5
    CoevalCube_z7.500.h5
    ...
    CoevalCube_z15.000.h5

Each CoevalCube_z{z:.3f}.h5 file is a standard CoevalCube HDF5 file containing delta_b, Grid_Temp, Grid_xHII, and Grid_xal for that redshift. This layout means:

Snapshots can be added incrementally without touching existing files.
The directory doubles as a resume checkpoint — painting skips redshifts whose file already exists.
There is no separate CoevalCube_*_z_index=N.h5 cache file.

classmethod create_empty(parameters: Parameters, directory: Path, snapshot_number: int = None, **kwargs) → TemporalCube[source]#

Create an empty TemporalCube output directory.

Parameters:

parameters (Parameters) – Simulation parameters used to build the directory name.
directory (Path) – Parent directory under which the output folder is created.
snapshot_number – Ignored — kept for API compatibility. Per-z files are written on demand so no pre-allocation is needed.
**kwargs – Forwarded to get_file_path() (e.g. input_tag).

Returns:

New instance whose _file_path points to the created directory.

Return type:

TemporalCube

classmethod get_file_path(directory: Path, parameters: Parameters, input_tag: str = None, **kwargs) → Path[source]#

Return the output directory path igm_data_{input_tag}_{beorn_hash}.

Parameters:: input_tag (str, optional) – Human-readable identifier for the upstream input data (e.g. 'py21cmfast_N128_D384_L100_seed12345_9b1f5a85').

classmethod grid_field_names(parameters: Parameters) → list[str][source]#: Return all grid-like fields stored in the temporal cube manifest.

classmethod read(file_path: Path | str = None, directory: Path = None, parameters: Parameters = None, **kwargs)[source]#

Open a TemporalCube directory without loading grid data.

Scans for CoevalCube_z*.h5 files and records the available redshifts from their filenames. Grid arrays (Grid_xHII, Grid_Temp, etc.) are not loaded into memory; call load_grids() when you need them.

The simplest way to reopen a completed run from its output folder:

cube = TemporalCube.read("igm_data_<tag>_<hash>/")

Parameters are recovered automatically from the embedded HDF5 metadata (or from parameters.yaml if present), so no Parameters object is required.

Parameters:

file_path (Path | str) – Direct path to the output directory.
kwargs (directory / parameters /) – Alternatively, derive the path via get_file_path().

classmethod simulation_name(parameters: Parameters) → str[source]#: Infer a stable simulation name for snapshot file names.

classmethod snapshot_directory(cube_path: Path) → Path[source]#: Return the directory containing per-snapshot field files.

classmethod snapshot_file_name(parameters: Parameters, field: str, snapshot_index: int) → str[source]#: Build the per-snapshot field file name.

append(grid_snapshot: CoevalCube, index: int) → None[source]#

Write a CoevalCube snapshot as z{z:.3f}.h5 inside the directory.

Parameters:

grid_snapshot (CoevalCube) – Snapshot to write.
index (int) – Unused — kept for API compatibility. The file name is derived from grid_snapshot.z.

global_mean(field: str) → numpy.ndarray[source]#

Compute the spatial mean of field at each redshift without loading all z into RAM.

Iterates over snapshots one at a time, so peak memory usage is one (Ncell, Ncell, Ncell) array regardless of how many redshifts exist.

Short aliases are accepted: 'dTb', 'Temp', 'xHII', 'xal'.

Parameters:: field (str) – Attribute name or short alias, e.g. 'xHII' or 'Grid_xHII'.
Returns:: 1D array of shape (n_z,) with the spatial mean per redshift.
Return type:: np.ndarray

load_grids()[source]#

Materialise all lazy grid fields into 4D numpy arrays.

After this call every grid field (Grid_xHII, Grid_Temp, etc.) is a plain (n_z, Ncell, Ncell, Ncell) numpy array held entirely in RAM. Useful when you need repeated random access across all redshifts or want to pass data to code that expects a numpy array.

If a field is already a numpy array it is left untouched.

Returns:: For convenient chaining, e.g. cube = TemporalCube.read(...).load_grids().
Return type:: self

power_spectrum(quantity: numpy.ndarray, parameters: Parameters) → tuple[numpy.ndarray, numpy.ndarray][source]#

Compute 1D power spectra for a given grid quantity over all z.

Parameters:

quantity (np.ndarray) – Array shaped (z, nx, ny, nz) to analyse.
parameters (Parameters) – Simulation parameters providing kbins and Lbox.

Returns:

(power_spectrum, bins) where power_spectrum has shape (n_z, n_k) and bins are the k-bin edges.

Return type:

tuple

redshift_of_reionization(ionization_fraction: float = 0.5) → int[source]#

Return the redshift index where the volume-averaged ionization crosses a threshold.

Parameters:: ionization_fraction (float) – Threshold volume-averaged ionization fraction. Default 0.5.
Returns:: Index in the time/redshift array corresponding to the crossing.
Return type:: int

snapshot(z: int | float) → CoevalCube[source]#

Load a single redshift snapshot as a CoevalCube.

Only the HDF5 file for that redshift is opened, so memory usage is one 3D snapshot. Derived quantities (e.g. Grid_dTb) are computed from 3D arrays instead of the full 4D multi-z arrays.

Parameters:: z (int | float) – Redshift index (int) or redshift value (float). When a float is given the nearest available snapshot is returned.
Returns:: Snapshot for the requested redshift.
Return type:: CoevalCube

snapshot_path(z: float) → Path[source]#: Return the path for the per-redshift file CoevalCube_z{z:.3f}.h5.

write(**kwargs)[source]#: Write the content of this dataclass into an HDF5 file. Can be called without any arguments to write to the default file path, or with a specific file path, or with additional keyword arguments to customize the file name.

property z: numpy.ndarray#: Alias for z_snapshots — kept for backward compatibility.

z_snapshots: numpy.ndarray = None#: Sorted array of redshifts for which snapshots are available on disk.

Data Structures

Contents

Data Structures#