SampleSet¶

class matk.sampleset.SampleSet(name, samples, parent, index_start=1, **kwargs)¶

MATK SampleSet class - Stores information related to a sample including parameter samples, associated responses, and sample indices

corner(bins=20, range=None, weights=None, color=u'k', smooth=None, smooth1d=None, labels=None, label_kwargs=None, show_titles=False, title_fmt=u'.2f', title_kwargs=None, truths=None, truth_color=u'#4682b4', scale_hist=False, quantiles=None, verbose=False, fig=None, max_n_ticks=5, top_ticks=False, use_math_text=False, hist_kwargs=None, **hist2d_kwargs)¶: Plot corner plot using the corner package written by Dan Foreman-Mackey (https://pypi.python.org/pypi/corner/1.0.0)

corr(type='pearson', plot=False, printout=True, plotvals=True, figsize=None, title=None)¶

Calculate correlation coefficients of parameters and responses

Parameters:

type (str) – Type of correlation coefficient (pearson by default, spearman also avaialable)
plot (bool) – If True, plot correlation matrix
printout (bool) – If True, print correlation matrix with row and column headings
plotvals (bool) – If True, print correlation coefficients on plot matrix
figsize (tuple(fl64,fl64)) – Width and height of figure in inches
title (str) – Title of plot

Returns:

ndarray(fl64) – Correlation coefficients

index_start¶: Starting integer value for sample indices

indices¶: Array of sample indices

main_effects()¶: For each parameter, compile array of main effects.

name¶: Sample set name

obsnames¶: Array of observation names

panels(type='pearson', alpha=0.2, figsize=None, title=None, tight=False, symbol='.', fontsize=None, corrfontsize=None, ms=5, mins=None, maxs=None, frequency=False, bins=10, ylim=None, labels=[], filename=None, xticks=2, yticks=2)¶

Plot histograms, scatterplots, and correlation coefficients in paired matrix

Parameters:

type (str) – Type of correlation coefficient (pearson by default, spearman also avaialable)
alpha (float) – Histogram color shading
figsize (tuple(fl64,fl64)) – Width and height of figure in inches
title (str) – Title of plot
tight (bool) – Use matplotlib tight layout
symbol (str) – matplotlib symbol for scatterplots
fontsize (fl64) – Size of font for axis labels
corrfontsize (fl64) – Size of font for correlation coefficients
ms (fl64) – Scatterplot marker size
frequency (bool) – If True, the first element of the return tuple will be the counts normalized by the length of data, i.e., n/len(x)
bins (int) – Number of bins in histograms
ylim (tuples - 2 element tuples with y limits for histograms) – y-axis limits for histograms.
labels (lst(str)) – Names to use instead of parameter names in plot
filename (str) – Name of file to save plot. File ending determines plot type (pdf, png, ps, eps, etc.). Plot types available depends on the matplotlib backend in use on the system. Plot will not be displayed.
xticks (int) – Number of ticks along x axes
yticks (int) – Number of ticks along y axes

pardict(index)¶

Get parameter dictionary for sample with specified index

Parameters:	index (int) – Sample index
Returns:	dict(fl64)

parnames¶: Array of observation names

rank_parameter_frequencies()¶

Yields a printout of parameter value frequencies in the sample set

returns An array of tuples, each containing the parameter name tagged as min or max and a: second tuple containing the parameter value and the frequency of its appearance in the sample set.

recarray¶: Structured (record) array of samples

run(cpus=1, workdir_base=None, save=True, reuse_dirs=False, outfile=None, logfile=None, restart_logfile=None, verbose=True, hosts={})¶

Run model using values in samples for parameter values If samples are not specified, LHS samples are produced

Parameters:

cpus (int,dict(lst)) – number of cpus; alternatively, dictionary of lists of processor ids keyed by hostnames to run models on (i.e. on a cluster); hostname provided as kwarg to model (hostname=<hostname>); processor id provided as kwarg to model (processor=<processor id>)
workdir_base (str) – Base name for model run folders, run index is appended to workdir_base
save (bool) – If True, model files and folders will not be deleted during parallel model execution
reuse_dirs (bool) – Will use existing directories if True, will return an error if False and directory exists
outfile (str) – File to write results to
logfile (str) – File to write details of run to during execution
restart_logfile (str) – Existing logfile containing completed runs, used to complete an incomplete sampling; Warning: sample indices are expected to match!
hosts (lst(str)) – Option deprecated, use cpus instead

Returns:

tuple(ndarray(fl64),ndarray(fl64)) - (Matrix of responses from sampled model runs siz rows by npar columns, Parameter samples, same as input samples if provided)

savetxt(outfile)¶

Save sampleset to file

Parameters:	outfile (str) – Name of file where sampleset will be written

sse¶: Sum of squared errors (sse) for all samples

subset(boolfcn, obs, *args, **kwargs)¶

Collect samples based on response values, remove all others

Parameters:	boofcn – Function that returns true for samples to keep and false for samples to remove obs (str) – Name of response to apply boolfcn to args – Additional arguments to add to boolfcn kwargs – Keyword arguments to add to boolfcn

matk.sampleset.hist(rc, ncols=4, figsize=None, alpha=0.2, title=None, tight=False, mins=None, maxs=None, frequency=False, bins=10, ylim=None, printout=True, labels=[], filename=None, fontsize=None, xticks=3)¶

Plot histograms of dataset

Parameters:

ncols (int) – Number of columns in plot matrix
figsize (tuple(fl64,fl64)) – Width and height of figure in inches
alpha (float) – Histogram color shading
title (str) – Title of plot
tight (bool) – Use matplotlib tight layout
mins (lst(fl64)) – Minimum values of recarray fields
maxs (lst(fl64)) – Maximum values of recarray fields
frequency (bool) – If True, the first element of the return tuple will be the counts normalized by the length of data, i.e., n/len(x)
bins (int or lst(lst(int))) – If an integer is given, bins + 1 bin edges are returned. Unequally spaced bins are supported if bins is a list of sequences for each histogram.
ylim (tuples - 2 element tuple with y limits for histograms) – y-axis limits for histograms.
labels (lst(str)) – Names to use instead of parameter names in plot
filename (str) – Name of file to save plot. File ending determines plot type (pdf, png, ps, eps, etc.). Plot types available depends on the matplotlib backend in use on the system. Plot will not be displayed.
fontsize (fl64) – Size of font
xticks (int) – Number of ticks on xaxes

Returns:

dict(lst(int),lst(fl64)) - dictionary of histogram data (counts,bins) keyed by name

matk.sampleset.corr(rc1, rc2, type='pearson', plot=False, printout=True, plotvals=True, figsize=None, title=None)¶

Calculate correlation coefficients of parameters and responses

Parameters:

rc1 – Data
rc2 – Data
type (str) – Type of correlation coefficient (pearson by default, spearman also avaialable)
plot (bool) – If True, plot correlation matrix
printout (bool) – If True, print correlation matrix with row and column headings
plotvals (bool) – If True, print correlation coefficients on plot matrix
figsize (tuple(fl64,fl64)) – Width and height of figure in inches
title (str) – Title of plot

Returns:

ndarray(fl64) – Correlation coefficients

Previous topic

Next topic

This Page

SampleSet¶