atomica.results

Implements interface for working with model outputs

The Result class is a wrapper for a Model instance, providing methods to conveniently access, plot, and export model outputs.

Functions

export_results(results[, filename, …]) Export Result outputs to a file

Classes

Ensemble([mapping_function, name, …]) Class for working with sampled Results
Result(model[, parset, name]) Storage for single simulation result
class atomica.results.Ensemble(mapping_function=None, name=None, baseline_results=None, **kwargs)[source]

Class for working with sampled Results

This class facilitates working with results and sampling. It manages the mapping of sets of results onto a scalar, which is then accumulated over samples. For example, we might sample from a ParameterSet and then run simulations with 2 different allocations to compare their expected difference. The Ensemble contains

  • A reduction function that maps from Results^N => R^M where typically M would index
Parameters:
  • mapping_function – A function that takes in a Result, or a list/dict of Results, and returns a single PlotData instance
  • name (Optional[str]) – Name for the Ensemble (will appear on plots)
  • baseline – Optionally provide the non-sampled results at instantiation
  • kwargs – Additional arguments to pass to the mapping function
_get_series()[source]

Flatten the series in samples

The Ensemble contains a list of PlotData containing a list of Series. Computing uncertainty requires iterating over the Series for a a particular result, pop, and output. This function returns a dict keyed by a (result,pop,output) tuple with a list of references to the underlying series. Thus, the series are organized either by self.samples which facilitates adding new samples, and self._get_series which facilitates computing uncertainties after the fact.

Return type:dict
Returns:A dict keyed by result-pop-output containing lists of sampled Series
add(results, **kwargs)[source]

Add a sample to the Ensemble

This function takes in Results and optionally any other arguments needed by the Ensemble’s mapping function. It calls the mapping function and adds the resulting PlotData instance to the list of samples.

Parameters:
  • results – A Result, or list/dict of Results, as supported by the mapping function
  • kwargs – Any additional keyword arguments to pass to the mapping function
Return type:

None

baseline = None

A single PlotData instance with reference values (i.e. outcome without sampling)

boxplot(fig=None, years=None, results=None, outputs=None, pops=None)[source]

Render a box plot

This is effectively an alternate approach to rendering the kernel density estimates for the distributions. The figure will have a box plot showing quantiles as whiskers for each quantity selected, filtered by the results, outputs, and pops arguments.

Parameters:
  • fig – Optionally specify an existing figure to plot into
  • years – Optionally specify years - otherwise, first time point will be used
  • results – Optionally specify list of result names
  • outputs – Optionally specify list of outputs
  • pops – Optionally specify list of pops
Returns:

A matplotlib figure (note that this method will only ever return a single figure)

mapping_function = None

This function gets called by Ensemble.add_sample()

n_samples

Return number of samples present

Return type:int
Returns:Number of samples contained in the Ensemble
outputs

Return a list of outputs

The outputs are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.

It is generally assumed that the baseline and all samples should have the same outputs and populations, because they should have all been generated with the same mapping function

Return type:list
Returns:A list of outputs (strings)
pairplot(year=None, outputs=None, pops=None)[source]
plot_bars(fig=None, years=None, results=None, outputs=None, pops=None, order=('years', 'results', 'outputs', 'pops'), horizontal=False, offset=None)[source]

Render a bar plot

Very similar to a boxplot, the bar plot with error bars doesn’t support stacking (because it can be misleading when stacking bars with errors, since the errors apply cumulatively within the bar).

If an existing figure is provided, this function will attempt to add to the existing figure by offsetting the new bars relative to the current axis limits. This is intended to facilitate comparing bar plots across multiple Ensembles.

Parameters:
  • fig – Optionally specify an existing figure to plot into
  • years – Optionally specify years - otherwise, first time point will be used. Data is interpolated onto this year
  • results – Optionally specify list of result names
  • outputs – Optionally specify list of outputs
  • pops – Optionally specify list of pops
  • order – An iterable specifying the order in which bars appear - should be a permutation of ('years','results','outputs','pops')
  • horizontal – If True, bar plot will be horizontal
  • offset (Optional[float]) – Offset value to apply to the position of the bar. If None, will be automatically determined based on existing plot contents.
Returns:

A matplotlib figure (note that this method will only ever return a single figure)

plot_distribution(year=None, fig=None, results=None, outputs=None, pops=None)[source]

Plot a kernel density distribution

This method will plot kernel density estimates for all outputs and populations in the Ensemble.

The PlotData instances stored in the Ensemble could contain more than one output/population. To facilitate superimposing Ensembles, by default they will all be plotted into the figure. Specifying a string or list of strings for the outputs and pops will select a subset of the quantities to plot. Most of the time, an Ensemble would only have one output/pop, so it probably wouldn’t matter.

Parameters:
  • year (Optional[float]) – If None, plots the first time index, otherwise, interpolate to the target year
  • fig – Optionally specify a figure handle to plot into
  • results – Optionally specify list of result names
  • outputs – Optionally specify list of outputs
  • pops – Optionally specify list of pops
Returns:

A matplotlib figure (note that this method will only ever return a single figure)

plot_series(fig=None, style='quartile', results=None, outputs=None, pops=None)[source]

Plot a time series with uncertainty

Parameters:
  • fig – Optionally specify the figure to render into
  • style – Specify whether to plot transparent lines (‘samples’), or shaded areas for uncertainty. For shaded areas, the style can be ‘std’, ‘ci’, or ‘quartile’ depending on how the size of the area should be computed
  • results – Select specific results to display
  • outputs – Select specific outputs to display
  • pops – Select specific populations to display
Returns:

The figure object that was rendered into

pops

Return a list of populations

The pops are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.

It is generally assumed that the baseline and all samples should have the same outputs and populations, because they should have all been generated with the same mapping function

Return type:list
Returns:A list of population names (strings)
results

Return a list of result names

The result names are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.

It is generally assumed that the results will all have the same name in the case that this Ensemble contains multiple PlotData samples. Otherwise, a key error may occur.

Return type:list
Returns:A list of population names (strings)
run_sims(proj, parset, progset=None, progset_instructions=None, result_names=None, n_samples=1, parallel=False, max_attempts=None)[source]

Run and store sampled simulations

Use this method to perform sampling if there is insufficient memory available to store all simulations prior to inserting into the Ensemble. This method adds Results to the Ensemble one at a time, so the memory required is never more than the number of Results taken in by the mapping function (typically this would either be 1, or the number of budget scenarios being compared).

Note that a separate function, _sample_and_map is used, which does the conversion to PlotData. This is so that the data reduction is performed on the parallel workers so that Multiprocessing only accumulates PlotData rather than Result instances.

Parameters:
  • proj – A Project instance
  • n_samples (int) – An integer number of samples
  • parset – A ParameterSet instance
  • progset – Optionally a ProgramSet instance
  • progset_instructions – This can be a list of instructions
  • result_names – Optionally specify names for each result. The most common usage would be when passing in a list of program instructions corresponding to different budget scenarios. The result names should be a list the same length as the instructions, or containing a single element if not using programs.
  • parallel – If True, run simulations in parallel (on Windows, must have if __name__ == '__main__' gating the calling code)
  • max_attempts – Number of retry attempts for bad initializations
Return type:

None

samples = None

A list of PlotData instances, one for each sample

set_baseline(results, **kwargs)[source]

Add a baseline to the Ensemble

This function assigns a special result corresponding to the unsampled case as a reference. This result can be rendered in a different way on plots - for example, as a vertical line on density estimates, or a solid line on a time series plot.

Parameters:
  • results – A Result, or list/dict of Results, as supported by the mapping function
  • kwargs – Any additional keyword arguments to pass to the mapping function
Return type:

None

summary_statistics(years=None, results=None, outputs=None, pops=None)[source]
tvec

Return time vector

The time vector are retrieved from the first sample, or the baseline if no samples are present yet, or an empty list if no samples present.

Return type:<built-in function array>
Returns:A time array from one of the stores PlotData instances
update(result_list, **kwargs)[source]

Add multiple samples to the Ensemble

The implementation of add() vs :meth`update` parallels the behaviour of Python built-in sets, where set.add() is used to add a single item, and set.update() is used to add multiple items. This function is intended for cases where the user has stores multiple samples in memory and wants to dynamically construct Ensembles after the fact.

The input list here is an iterable, and Ensemble.add() gets called on every item in the list. It is up to the mapping function then to handle whether the items in result_list are single Result instances or lists/tuples/dicts of Results.

Parameters:
  • result_list – A list of samples, as supported by the mapping function (i.e. the individual items would work with Ensemble.add())
  • kwargs – Any additional keyword arguments to pass to the mapping function
Return type:

None

class atomica.results.Result(model, parset=None, name=None)[source]

Storage for single simulation result

A Result object (similar to the raw_result in Optima HIV) stores a complete simulation run. In Atomica, a Result is a lightweight wrapper around a Model object. During a simulation, the Model object contains integration objects like compartments, links, and parameters, which store values for each quantity at every time step. The methods in the Model class are oriented at performing the calculations required to simulate the model. A Result object contains within it a single Model object, which in turn contains all of the integration objects together with the data they contain and the relationships between them, as well as the programs.ProgramSet and programs.ProgramInstructions that were used to perform the simulation. The methods of the Result class are oriented at plotting and exporting.

Parameters:
  • model – A single model.Model instance (after integration
  • parset – A parameters.ParameterSet instance
  • name – The name to use for the new Result object
budget(year=None)[source]

Return budget at a given year

This will return the per-year spending rate taking into account any budget scenarios that are present.

Parameters:year – Optionally specify a time or array of times. Otherwise, use all times
Returns:A dict keyed by program name containing arrays of spending values
charac_names(pop_name)[source]

Return list of characteristic names

This method returns all of the characteristic names available within a specified population

Parameters:pop_name (str) – The name of one of the populations in the Result
Return type:list
Returns:List of characteristic code names
comp_names(pop_name)[source]

Return compartment names within a population

Parameters:pop_name (str) – The code name of one of the populations
Return type:list
Returns:List of code names of all compartments within that population
dt

Return simulation timestep

Return type:float
Returns:The simulation timestep (scalar)
export_raw(filename=None)[source]

Save raw outputs

This method produces a single Pandas DataFrame with all of the raw model values, and then optionally saves it to an Excel file.

Parameters:filename – The file name of the Excel file to write. If not provided, no file will be written
Return type:DataFrame
Returns:A DataFrame with all model outputs
framework

Return framework from Result

Returns:A ProjectFramework instance
get_alloc(year=None)[source]

Return spending allocation If the result was generated using programs, this method :param year: Optionally specify a scalar or list/array of years to return budget values

for. Otherwise, uses all simulation times
Return type:dict
Returns:Dictionary keyed by program name with arrays of spending values
get_coverage(quantity='fraction', year=None)[source]

Return program coverage

This function is the primary function to use when wanting to query coverage values. All coverage quantities are accessible via the Result object because the compartment sizes and thus eligible people are known.

Parameters:
  • quantity (str) – One of - ‘capacity’ - Program capacity in units of ‘people/year’ (for all types of programs) - ‘eligible’ - The number of people eligible for the program (coverage denominator) in units of ‘people’ - ‘fraction’ - capacity/eligible, the fraction coverage (maximum value is 1.0) - this quantity is dimensionless - ‘number’ - The number of people covered (fraction*eligible) returned in units of ‘people/year’
  • year – Optionally specify a scalar or list/array of years to return budget values for. Otherwise, uses all simulation times
Return type:

dict

Returns:

Requested values in dictionary {prog_name:value} in requested years

get_variable(name, pops=None)[source]

Retrieve integration objects

This method retrieves an integration object from the model for a given population. It serves as a shortcut for ``model.Population.get_variable()` by incorporating the population lookup in the same step.

Parameters:
  • pops (Optional[str]) – The name of a population
  • name (str) – The name of a variable
Return type:

list

Returns:

A list of matching variables (integration objects)

Return list of link names

This method returns all of the link names available within a specified population. The names will be unique (so duplicate links will only appear once in the list of names)

Parameters:pop_name (str) – The name of one of the populations in the Result
Return type:list
Returns:List of link code names
model = None

A completed model run that serves as primary storage for the underlying values

par_names(pop_name)[source]

Return list of parameter names

This method returns all of the parameter names available within a specified population

Parameters:pop_name (str) – The name of one of the populations in the Result
Return type:list
Returns:List of parameter code names
parset_name = None

The name of the ParameterSet that was used for the simulation

plot(plot_name=None, plot_group=None, pops=None, project=None)[source]

Produce framework-defined plot

This method plots a single Result instance using the plots defined in the framework.

If plot_group is not None, then plot_name is ignored If plot_name and plot_group are both None, then all plots will be displayed

Parameters:
  • plot_name – The name of a single plot in the Framework
  • plot_group – The name of a plot group
  • pops – A population aggregation supposed by PlotData (e.g. ‘all’)
  • project – A Project instance used to plot data and full names
Returns:

List of figure objects

pop_labels

Return all population full names

The full names/labels are returned in the same order as the names in Result.pop_names

Returns:List of population full names
pop_names = None

A list of the population names present. This gets frequently used, so it is saved as an actual output

t

Return simulation time vector

Return type:<built-in function array>
Returns:Array of all time points available in the result
used_programs

Flag whether programs were used or not

Return type:bool
Returns:True if a progset and program instructions were present. Note that programs will be considered active even if the start/stop years in the instructions don’t overlap the simulation years (so no overwrite actually took place).
atomica.results._cascade_to_df(results, cascade_name, tvals)[source]

Return a DataFrame for a cascade for a group of results

The dataframe will have a three-level MultiIndex for the result, population, and cascade stage :param results: List of Results :param cascade_name: The name or index of a cascade stage (interpretable by get_cascade_vals) :param tvals: Outputs will be interpolated onto the times in this array (typically would be annual) :return: A DataFrame

atomica.results._filter_pops_by_output(result, output)[source]

Helper function for plotting quantities

With population types, a given output/output aggregation may only be defined in a subset of populations. To deal with this when plotting Result objects, it’s necessary to work out which population the requested output aggregation can be plotted in. This function takes in an output definition and returns a list of populations matching this.

Parameters:output – An output aggregation string e.g. ‘alive’ or ‘:ddis’ or {[‘lt_inf’,’lteu’]} (supported by PlotData/get_variable)
Return type:list
Returns:A list of population code names
atomica.results._output_to_df(results, output_name, output, tvals)[source]

Convert an output to a DataFrame for a group of results

This function takes in a list of results, and an output specification recognised by PlotData. It extracts the outputs from all results and stores them in a 3-level MultiIndexed dataframe, which is returned. The index levels are the name of the output, the name of the results, and the populations.

In addition, this function attempts to aggregate the outputs, if the units of the outputs matches known units. If the units lead to anver obvious use of summation or weighted averating, it will be used. Otherwise, the output will contain NaNs for the population-aggregated results, which will appear as empty cells in the Excel spreadsheet so the user is able to fill them in themselves.

Parameters:
  • results – List of Results
  • output_name (str) – The name to use for the output quantity
  • output – An output specification/aggregation supported by PlotData
  • tvals – Outputs will be interpolated onto the times in this array (typically would be annual)
Return type:

DataFrame

Returns:

A DataFrame

atomica.results._programs_to_df(results, prog_name, tvals)[source]

Return a DataFrame for program outputs for a group of results

The dataframe will have a three-level MultiIndex for the program, result, and program quantity (e.g. spending, coverage fraction)

Parameters:
  • results – List of Results
  • prog_name – The name of a program
  • tvals – Outputs will be interpolated onto the times in this array (typically would be annual)
Returns:

A DataFrame

atomica.results._sample_and_map(proj, parset, progset, progset_instructions, result_names, mapping_function, max_attempts, **kwargs)[source]

Helper function to sample

This function runs a sampled simulation and also calls an Ensemble’s mapping function prior to returning. This means that the Result goes out of scope and is discarded. Used when performing parallel simulations via Ensemble.run_sims() (which is used for memory-constrained simulations)

atomica.results._write_df(writer, formats, sheet_name, df, level_ordering)[source]

Write a list of DataFrames into a worksheet

Parameters:
  • writer – A Pandas ExcelWriter instance specifying the file to write into
  • formats – The output of standard_formats(workbook) specifying the styles embedded in the workbook
  • sheet_name – The name of the sheet to create. It is assumed that this sheet will be generated entirely by this function call (i.e. the sheet is not already present)
  • df – A DataFrame that has a MultiIndex
  • level_ordering – Tuple of index level names. Split the dataframe such that a separate table is generated for the first level, and then the rows are reordered by the remaining levels and the original sort order is preserved. The contents of this tuple need to match the levels present in the dataframe
Returns:

None

atomica.results.export_results(results, filename=None, output_ordering=('output', 'result', 'pop'), cascade_ordering=('pop', 'result', 'stage'), program_ordering=('program', 'result', 'quantity'))[source]

Export Result outputs to a file

This function writes an XLSX file with the data corresponding to any Cascades or Plots that are present. Note that results are exported for every year by selecting integer years. Flow rates are annualized instantaneously. So for example, the flow will have values from 2014, 2015, 2016, but the 2015 flow rate is the actual flow at 2015.0 divided by dt, not the time-aggregated flow rate. Time-aggregation isn’t appropriate here because many of the quantities plotted are probabilities. Selecting the annualized value at a particular year also means that the data being exported will match up with whatever plots are generated from within Atomica.

Optionally can specify a list/set of names of the plots/cascades to include in the export Set to an empty list to omit that category e.g.

>>> plot_names = None # export all plots in framework
>>> plot_names = ['a','b'] # export only plots 'a' and 'b'
>>> plot_names = [] # don't export any plots e.g. to only export cascades
Parameters:
  • results – A Result, or list of Results. Results must all have different names. Outputs are drawn from the first result, normally all results would have the same framework and populations.
  • filename – Write an excel file. If ‘None’, no file will be written (but dataframes will be returned)
  • output_ordering – A tuple specifying the grouping of outputs, results, and pops for the Plots and targetable parameters sheet. The first item in the tuple will split the dataframes into separate tables. Then within the tables, rows will be grouped by the second item
  • cascade_ordering – A similar tuple specifying the groupings for the cascade sheets. The cascade tables are always split by cascade first, so the order in this tuple only affects the column ordering
  • program_ordering – A similar tuple specifying the groupings for the program sheets
Returns:

The name of the file that was written