Title: | Bayesian Probit Choice Modeling |
---|---|
Description: | Bayes estimation of probit choice models, both in the cross-sectional and panel setting. The package can analyze binary, multivariate, ordered, and ranked choices, as well as heterogeneity of choice behavior among deciders. The main functionality includes model fitting via Markov chain Monte Carlo m ethods, tools for convergence diagnostic, choice data simulation, in-sample and out-of-sample choice prediction, and model selection using information criteria and Bayes factors. The latent class model extension facilitates preference-based decider classification, where the number of latent classes can be inferred via the Dirichlet process or a weight-based updating heuristic. This allows for flexible modeling of choice behavior without the need to impose structural constraints. For a reference on the method see Oelschlaeger and Bauer (2021) <https://trid.trb.org/view/1759753>. |
Authors: | Lennart Oelschläger [aut, cre] , Dietmar Bauer [aut] , Sebastian Büscher [ctb], Manuel Batram [ctb] |
Maintainer: | Lennart Oelschläger <[email protected]> |
License: | GPL-3 |
Version: | 1.1.4 |
Built: | 2025-01-12 05:54:25 UTC |
Source: | https://github.com/loelschlaeger/rprobitb |
In {RprobitB}
, alternative specific covariates must be named in the format
"<covariate>_<alternative>"
. This convenience function generates
the format for a given choice_data
set.
as_cov_names(choice_data, cov, alternatives)
as_cov_names(choice_data, cov, alternatives)
choice_data |
A |
cov |
A character vector of the names of alternative specific covariates in
|
alternatives |
A (character or numeric) vector of the alternative names. |
The choice_data
input with updated column names.
data("Electricity", package = "mlogit") cov <- c("pf", "cl", "loc", "wk", "tod", "seas") alternatives <- 1:4 colnames(Electricity) Electricity <- as_cov_names(Electricity, cov, alternatives) colnames(Electricity)
data("Electricity", package = "mlogit") cov <- c("pf", "cl", "loc", "wk", "tod", "seas") alternatives <- 1:4 colnames(Electricity) Electricity <- as_cov_names(Electricity, cov, alternatives) colnames(Electricity)
This function checks the input form
.
check_form(form, re = NULL, ordered = FALSE)
check_form(form, re = NULL, ordered = FALSE)
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
re |
A character (vector) of covariates of |
ordered |
A boolean, |
A list that contains the following elements:
The input form
.
The name choice
of the dependent variable in form
.
The input re
.
A list vars
of three character vectors of covariate names of
the three covariate types.
A boolean ASC
, determining whether the model has ASCs.
overview_effects()
for an overview of the model effects
This function checks the compatibility of submitted parameters for the prior distributions and sets missing values to default values.
check_prior( P_f, P_r, J, ordered = FALSE, eta = numeric(P_f), Psi = diag(P_f), delta = 1, xi = numeric(P_r), D = diag(P_r), nu = P_r + 2, Theta = diag(P_r), kappa = if (ordered) 4 else (J + 1), E = if (ordered) diag(1) else diag(J - 1), zeta = numeric(J - 2), Z = diag(J - 2) )
check_prior( P_f, P_r, J, ordered = FALSE, eta = numeric(P_f), Psi = diag(P_f), delta = 1, xi = numeric(P_r), D = diag(P_r), nu = P_r + 2, Theta = diag(P_r), kappa = if (ordered) 4 else (J + 1), E = if (ordered) diag(1) else diag(J - 1), zeta = numeric(J - 2), Z = diag(J - 2) )
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
P_r |
The number of covariates connected to a random coefficient (can be 0). |
J |
The number (greater or equal 2) of choice alternatives. |
ordered |
A boolean, |
eta |
The mean vector of length |
Psi |
The covariance matrix of dimension |
delta |
A numeric for the concentration parameter vector |
xi |
The mean vector of length |
D |
The covariance matrix of dimension |
nu |
The degrees of freedom (a natural number greater than |
Theta |
The scale matrix of dimension |
kappa |
The degrees of freedom (a natural number greater than |
E |
The scale matrix of dimension |
zeta |
The mean vector of length |
Z |
The covariance matrix of dimension |
A priori, we assume that the model parameters follow these distributions:
for all classes
for all classes
where denotes the normal,
the Dirichlet, and
the Inverted Wishart distribution.
An object of class RprobitB_prior
, which is a list containing all
prior parameters. Parameters that are not relevant for the model
configuration are set to NA
.
check_prior(P_f = 1, P_r = 2, J = 3, ordered = TRUE)
check_prior(P_f = 1, P_r = 2, J = 3, ordered = TRUE)
This function returns the choice probabilities of an RprobitB_fit
object.
choice_probabilities(x, data = NULL, par_set = mean)
choice_probabilities(x, data = NULL, par_set = mean)
x |
An object of class |
data |
Either |
par_set |
Specifying the parameter set for calculation and either
|
A data frame of choice probabilities with choice situations in rows and
alternatives in columns. The first two columns are the decider identifier
"id"
and the choice situation identifier "idc"
.
data <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2) x <- fit_model(data) choice_probabilities(x)
data <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2) x <- fit_model(data) choice_probabilities(x)
This function classifies the deciders based on their allocation to the components of the mixing distribution.
classification(x, add_true = FALSE)
classification(x, add_true = FALSE)
x |
An object of class |
add_true |
Set to |
The function can only be used if the model has at least one random effect
(i.e. P_r >= 1
) and at least two latent classes (i.e. C >= 2
).
In that case, let denote the class allocations
of the
deciders based on their estimated mixed coefficients
.
Independently for each decider
, the conditional probability
of having
allocated to class
for
depends on the class
allocation vector
, the class means
and the class
covariance matrices
and is proportional to
This function displays the relative frequencies of which each decider was allocated to the classes during the Gibbs sampling. Only the thinned samples after the burn-in period are considered.
A data frame. The row names are the decider ids. The first C
columns
contain the relative frequencies with which the deciders are allocated to
the C
classes. Next, the column est
contains the estimated
class of the decider based on the highest allocation frequency. If
add_true
, the next column true
contains the true class
memberships.
update_z()
for the updating function of the class allocation vector.
This function extracts the estimated model effects.
## S3 method for class 'RprobitB_fit' coef(object, ...)
## S3 method for class 'RprobitB_fit' coef(object, ...)
object |
An object of class |
... |
Ignored. |
An object of class RprobitB_coef
.
This function computes the probability for each observed choice at the
(normalized, burned and thinned) samples from the posterior. These
probabilities are required to compute the WAIC
and the
marginal model likelihood mml
.
compute_p_si(x, ncores = parallel::detectCores() - 1, recompute = FALSE)
compute_p_si(x, ncores = parallel::detectCores() - 1, recompute = FALSE)
x |
An object of class |
ncores |
This function is parallelized, set the number of cores here. |
recompute |
Set to |
The object x
, including the object p_si
, which is a matrix of
probabilities, observations in rows and posterior samples in columns.
This convenience function returns the estimated covariance matrix of the mixing distribution.
cov_mix(x, cor = FALSE)
cov_mix(x, cor = FALSE)
x |
An object of class |
cor |
If |
The estimated covariance matrix of the mixing distribution. In case of multiple classes, a list of matrices for each class.
This function creates lagged choice covariates from the data.frame
choice_data
, which is assumed to be sorted by the choice occasions.
create_lagged_cov(choice_data, column, k = 1, id = "id")
create_lagged_cov(choice_data, column, k = 1, id = "id")
choice_data |
A |
column |
A character, the column name in |
k |
A positive number, the number of lags (in units of observations), see the
details. Can be a vector. The default is |
id |
A character, the name of the column in |
Say that choice_data
contains the column column
. Then, the
function call
create_lagged_cov(choice_data, column, k, id)
returns the input choice_data
which includes a new column named
column.k
. This column contains for each decider (based on id
)
and each choice occasion the covariate faced before k
choice
occasions. If this data point is not available, it is set to
NA
. In particular, the first k
values of column.k
will
be NA
(initial condition problem).
The input choice_data
with the additional columns named
column.k
for each element column
and each number k
containing the lagged covariates.
This function performs Markov chain Monte Carlo simulation for fitting different types of probit models (binary, multivariate, mixed, latent class, ordered, ranked) to discrete choice data.
fit_model( data, scale = "Sigma_1,1 := 1", R = 1000, B = R/2, Q = 1, print_progress = getOption("RprobitB_progress"), prior = NULL, latent_classes = NULL, seed = NULL, fixed_parameter = list() )
fit_model( data, scale = "Sigma_1,1 := 1", R = 1000, B = R/2, Q = 1, print_progress = getOption("RprobitB_progress"), prior = NULL, latent_classes = NULL, seed = NULL, fixed_parameter = list() )
data |
An object of class |
scale |
A character which determines the utility scale. It is of the form
|
R |
The number of iterations of the Gibbs sampler. |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Q |
The thinning factor for the Gibbs samples, i.e. only every |
print_progress |
A boolean, determining whether to print the Gibbs sampler progress and the estimated remaining computation time. |
prior |
A named list of parameters for the prior distributions. See the documentation
of |
latent_classes |
Either
|
seed |
Set a seed for the Gibbs sampling. |
fixed_parameter |
Optionally specify a named list with fixed parameter values for |
See the vignette on model fitting for more details.
An object of class RprobitB_fit
.
prepare_data()
and simulate_choices()
for building an
RprobitB_data
object
update()
for estimating nested models
transform()
for transforming a fitted model
data <- simulate_choices( form = choice ~ var | 0, N = 100, T = 10, J = 3, seed = 1 ) model <- fit_model(data = data, R = 1000, seed = 1) summary(model)
data <- simulate_choices( form = choice ~ var | 0, N = 100, T = 10, J = 3, seed = 1 ) model <- fit_model(data = data, R = 1000, seed = 1) summary(model)
This convenience function returns the covariates and the choices of specific choice occasions.
get_cov(x, id, idc, idc_label)
get_cov(x, id, idc, idc_label)
x |
Either an object of class |
id |
A numeric (vector), that specifies the decider(s). |
idc |
A numeric (vector), that specifies the choice occasion(s). |
idc_label |
The name of the column that contains the choice occasion identifier. |
A subset of the choice_data
data frame specified in prepare_data()
.
This function approximates the model's marginal likelihood.
mml(x, S = 0, ncores = parallel::detectCores() - 1, recompute = FALSE)
mml(x, S = 0, ncores = parallel::detectCores() - 1, recompute = FALSE)
x |
An object of class |
S |
The number of prior samples for the prior arithmetic mean estimate. Per
default, |
ncores |
Computation of the prior arithmetic mean estimate is parallelized, set the number of cores. |
recompute |
Set to |
The model's marginal likelihood for a model
and data
is required for the computation of Bayes factors. In general, the
term has no closed form and must be approximated numerically.
This function uses the posterior Gibbs samples to approximate the model's
marginal likelihood via the posterior harmonic mean estimator.
To check the convergence, call plot(x$mml)
, where x
is the output
of this function. If the estimation does not seem to have
converged, you can improve the approximation by combining the value
with the prior arithmetic mean estimator. The final approximation of the
model's marginal likelihood than is a weighted sum of the posterior harmonic
mean estimate and the prior arithmetic mean estimate,
where the weights are determined by the sample sizes.
The object x
, including the object mml
, which is the model's
approximated marginal likelihood value.
This function returns a table with several criteria for model comparison.
model_selection( ..., criteria = c("npar", "LL", "AIC", "BIC"), add_form = FALSE )
model_selection( ..., criteria = c("npar", "LL", "AIC", "BIC"), add_form = FALSE )
... |
One or more objects of class |
criteria |
A vector of one or more of the following characters:
|
add_form |
Set to |
See the vignette on model selection for more details.
A data frame, criteria in columns, models in rows.
This function extracts the number of model parameters of an
RprobitB_fit
object.
npar(object, ...) ## S3 method for class 'RprobitB_fit' npar(object, ...)
npar(object, ...) ## S3 method for class 'RprobitB_fit' npar(object, ...)
object |
An object of class |
... |
Optionally more objects of class |
Either a numeric value (if just one object is provided) or a numeric vector.
This function gives an overview of the effect names, whether the covariate is alternative-specific, whether the coefficient is alternative-specific, and whether it is a random effect.
overview_effects( form, re = NULL, alternatives, base = tail(alternatives, 1), ordered = FALSE )
overview_effects( form, re = NULL, alternatives, base = tail(alternatives, 1), ordered = FALSE )
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
base |
A character, the name of the base alternative for covariates that are not
alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to
|
ordered |
A boolean, |
A data frame, each row is a effect, columns are the effect name
"effect"
, and booleans whether the covariate is alternative-specific
"as_value"
, whether the coefficient is alternative-specific
"as_coef"
, and whether it is a random effect "random"
.
check_form()
for checking the model formula specification.
overview_effects( form = choice ~ price + time + comfort + change | 1, re = c("price", "time"), alternatives = c("A", "B"), base = "A" )
overview_effects( form = choice ~ price + time + comfort + change | 1, re = c("price", "time"), alternatives = c("A", "B"), base = "A" )
This function draws receiver operating characteristic (ROC) curves.
plot_roc(..., reference = NULL)
plot_roc(..., reference = NULL)
... |
One or more |
reference |
The reference alternative. |
No return value. Draws a plot to the current device.
This function is the plot method for an object of class RprobitB_data
.
## S3 method for class 'RprobitB_data' plot(x, by_choice = FALSE, alpha = 1, position = "dodge", ...)
## S3 method for class 'RprobitB_data' plot(x, by_choice = FALSE, alpha = 1, position = "dodge", ...)
x |
An object of class |
by_choice |
Set to |
alpha , position
|
Passed to |
... |
Ignored. |
No return value. Draws a plot to the current device.
data <- simulate_choices( form = choice ~ cost | 0, N = 100, T = 10, J = 2, alternatives = c("bus", "car"), true_parameter = list("alpha" = -1) ) plot(data, by_choice = TRUE)
data <- simulate_choices( form = choice ~ cost | 0, N = 100, T = 10, J = 2, alternatives = c("bus", "car"), true_parameter = list("alpha" = -1) ) plot(data, by_choice = TRUE)
This function is the plot method for an object of class RprobitB_fit
.
## S3 method for class 'RprobitB_fit' plot(x, type, ignore = NULL, ...)
## S3 method for class 'RprobitB_fit' plot(x, type, ignore = NULL, ...)
x |
An object of class |
type |
The type of plot, which can be one of:
See the details section for visualization options. |
ignore |
A character (vector) of covariate or parameter names that do not get visualized. |
... |
Ignored. |
No return value. Draws a plot to the current device.
This function computes the point estimates of an RprobitB_fit
.
Per default, the mean
of the Gibbs samples is used as a point estimate.
However, any statistic that computes a single numeric value out of a vector of
Gibbs samples can be specified for FUN
.
point_estimates(x, FUN = mean)
point_estimates(x, FUN = mean)
x |
An object of class |
FUN |
A function that computes a single numeric value out of a vector of numeric values. |
An object of class RprobitB_parameter
.
data <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2) model <- fit_model(data) point_estimates(model) point_estimates(model, FUN = median)
data <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2) model <- fit_model(data) point_estimates(model) point_estimates(model, FUN = median)
This function computes the prediction accuracy of an RprobitB_fit
object. Prediction accuracy means the share of choices that are correctly
predicted by the model, where prediction is based on the maximum choice
probability.
pred_acc(x, ...)
pred_acc(x, ...)
x |
An object of class |
... |
Optionally specify more |
A numeric.
This function predicts the discrete choice behavior
## S3 method for class 'RprobitB_fit' predict(object, data = NULL, overview = TRUE, digits = 2, ...)
## S3 method for class 'RprobitB_fit' predict(object, data = NULL, overview = TRUE, digits = 2, ...)
object |
An object of class |
data |
Either
|
overview |
If |
digits |
The number of digits of the returned choice probabilities. |
... |
Ignored. |
Predictions are made based on the maximum predicted probability for each choice alternative. See the vignette on choice prediction for a demonstration on how to visualize the model's sensitivity and specificity by means of a receiver operating characteristic (ROC) curve.
Either a table if overview = TRUE
or a data frame otherwise.
data <- simulate_choices( form = choice ~ cov, N = 10, T = 10, J = 2, seed = 1 ) data <- train_test(data, test_proportion = 0.5) model <- fit_model(data$train) predict(model) predict(model, overview = FALSE) predict(model, data = data$test) predict( model, data = data.frame("cov_A" = c(1, 1, NA, NA), "cov_B" = c(1, NA, 1, NA)), overview = FALSE )
data <- simulate_choices( form = choice ~ cov, N = 10, T = 10, J = 2, seed = 1 ) data <- train_test(data, test_proportion = 0.5) model <- fit_model(data$train) predict(model) predict(model, overview = FALSE) predict(model, data = data$test) predict( model, data = data.frame("cov_A" = c(1, 1, NA, NA), "cov_B" = c(1, NA, 1, NA)), overview = FALSE )
This function prepares choice data for estimation.
prepare_data( form, choice_data, re = NULL, alternatives = NULL, ordered = FALSE, ranked = FALSE, base = NULL, id = "id", idc = NULL, standardize = NULL, impute = "complete_cases" )
prepare_data( form, choice_data, re = NULL, alternatives = NULL, ordered = FALSE, ranked = FALSE, base = NULL, id = "id", idc = NULL, standardize = NULL, impute = "complete_cases" )
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
choice_data |
A |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
ordered |
A boolean, |
ranked |
TBA |
base |
A character, the name of the base alternative for covariates that are not
alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to
|
id |
A character, the name of the column in |
idc |
A character, the name of the column in |
standardize |
A character vector of names of covariates that get standardized.
Covariates of type 1 or 3 have to be addressed by
|
impute |
A character that specifies how to handle missing covariate entries in
|
Requirements for the data.frame
choice_data
:
It must contain a column named id
which contains unique
identifier for each decision maker.
It can contain a column named idc
which contains unique
identifier for each choice situation of each decision maker.
If this information is missing, these identifier are generated
automatically by the appearance of the choices in the data set.
It can contain a column named choice
with the observed
choices, where choice
must match the name of the dependent
variable in form
.
Such a column is required for model fitting but not for prediction.
It must contain a numeric column named p_j for each alternative
specific covariate p in form
and each choice alternative j
in alternatives
.
It must contain a numeric column named q for each covariate q
in form
that is constant across alternatives.
In the ordered case (ordered = TRUE
), the column choice
must
contain the full ranking of the alternatives in each choice occasion as a
character, where the alternatives are separated by commas, see the examples.
See the vignette on choice data for more details.
An object of class RprobitB_data
.
check_form()
for checking the model formula
overview_effects()
for an overview of the model effects
create_lagged_cov()
for creating lagged covariates
as_cov_names()
for re-labeling alternative-specific covariates
simulate_choices()
for simulating choice data
train_test()
for splitting choice data into a train and test subset
data <- prepare_data( form = choice ~ price + time + comfort + change | 0, choice_data = train_choice, re = c("price", "time"), id = "deciderID", idc = "occasionID", standardize = c("price", "time") ) ### ranked case choice_data <- data.frame( "id" = 1:3, "choice" = c("A,B,C", "A,C,B", "B,C,A"), "cov" = 1 ) data <- prepare_data( form = choice ~ 0 | cov + 0, choice_data = choice_data, ranked = TRUE )
data <- prepare_data( form = choice ~ price + time + comfort + change | 0, choice_data = train_choice, re = c("price", "time"), id = "deciderID", idc = "occasionID", standardize = c("price", "time") ) ### ranked case choice_data <- data.frame( "id" = 1:3, "choice" = c("A,B,C", "A,C,B", "B,C,A"), "cov" = 1 ) data <- prepare_data( form = choice ~ 0 | cov + 0, choice_data = choice_data, ranked = TRUE )
This function computes the Gelman-Rubin statistic R_hat
.
R_hat(samples, parts = 2)
R_hat(samples, parts = 2)
samples |
A vector or a matrix of samples from a Markov chain, e.g. Gibbs samples.
If |
parts |
The number of parts to divide each chain into sub-chains. |
A numeric value, the Gelman-Rubin statistic.
https://bookdown.org/rdpeng/advstatcomp/monitoring-convergence.html
no_chains <- 2 length_chains <- 1e3 samples <- matrix(NA_real_, length_chains, no_chains) samples[1, ] <- 1 Gamma <- matrix(c(0.8, 0.1, 0.2, 0.9), 2, 2) for (c in 1:no_chains) { for (t in 2:length_chains) { samples[t, c] <- sample(1:2, 1, prob = Gamma[samples[t - 1, c], ]) } } R_hat(samples)
no_chains <- 2 length_chains <- 1e3 samples <- matrix(NA_real_, length_chains, no_chains) samples[1, ] <- 1 Gamma <- matrix(c(0.8, 0.1, 0.2, 0.9), 2, 2) for (c in 1:no_chains) { for (t in 2:length_chains) { samples[t, c] <- sample(1:2, 1, prob = Gamma[samples[t - 1, c], ]) } } R_hat(samples)
This function creates an object of class RprobitB_parameter
, which
contains the parameters of a probit model.
If sample = TRUE
, missing parameters are sampled. All parameters are
checked against the values of P_f
, P_r
, J
, and N
.
RprobitB_parameter( P_f, P_r, J, N, ordered = FALSE, alpha = NULL, C = NULL, s = NULL, b = NULL, Omega = NULL, Sigma = NULL, Sigma_full = NULL, beta = NULL, z = NULL, d = NULL, seed = NULL, sample = TRUE )
RprobitB_parameter( P_f, P_r, J, N, ordered = FALSE, alpha = NULL, C = NULL, s = NULL, b = NULL, Omega = NULL, Sigma = NULL, Sigma_full = NULL, beta = NULL, z = NULL, d = NULL, seed = NULL, sample = TRUE )
P_f |
The number of covariates connected to a fixed coefficient (can be 0). |
P_r |
The number of covariates connected to a random coefficient (can be 0). |
J |
The number (greater or equal 2) of choice alternatives. |
N |
The number (greater or equal 1) of decision makers. |
ordered |
A boolean, |
alpha |
The fixed coefficient vector of length |
C |
The number (greater or equal 1) of latent classes of decision makers.
Set to |
s |
The vector of class weights of length |
b |
The matrix of class means as columns of dimension |
Omega |
The matrix of class covariance matrices as columns of dimension
|
Sigma |
The differenced error term covariance matrix of dimension
|
Sigma_full |
The error term covariance matrix of dimension |
beta |
The matrix of the decision-maker specific coefficient vectors of dimension
|
z |
The vector of the allocation variables of length |
d |
The numeric vector of the logarithmic increases of the utility thresholds
in the ordered probit case ( |
seed |
Set a seed for the sampling of missing parameters. |
sample |
A boolean, if |
An object of class RprobitB_parameter
, i.e. a named list with the
model parameters alpha
, C
, s
, b
, Omega
,
Sigma
, Sigma_full
, beta
, and z
.
RprobitB_parameter(P_f = 1, P_r = 2, J = 3, N = 10)
RprobitB_parameter(P_f = 1, P_r = 2, J = 3, N = 10)
This function simulates choice data from a probit model.
simulate_choices( form, N, T = 1, J, re = NULL, alternatives = NULL, ordered = FALSE, ranked = FALSE, base = NULL, covariates = NULL, seed = NULL, true_parameter = list() )
simulate_choices( form, N, T = 1, J, re = NULL, alternatives = NULL, ordered = FALSE, ranked = FALSE, base = NULL, covariates = NULL, seed = NULL, true_parameter = list() )
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
N |
The number (greater or equal 1) of decision makers. |
T |
The number (greater or equal 1) of choice occasions or a vector of choice
occasions of length |
J |
The number (greater or equal 2) of choice alternatives. |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
ordered |
A boolean, |
ranked |
TBA |
base |
A character, the name of the base alternative for covariates that are not
alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to
|
covariates |
A named list of covariate values. Each element must be a vector of length equal to the number of choice occasions and named according to a covariate. Covariates for which no values are supplied are drawn from a standard normal distribution. |
seed |
Set a seed for the simulation. |
true_parameter |
Optionally specify a named list with true parameter values for |
See the vignette on choice data for more details.
An object of class RprobitB_data
.
check_form()
for checking the model formula
overview_effects()
for an overview of the model effects
create_lagged_cov()
for creating lagged covariates
as_cov_names()
for re-labeling alternative-specific covariates
prepare_data()
for preparing empirical choice data
train_test()
for splitting choice data into a train and test subset
### simulate data from a binary probit model with two latent classes data <- simulate_choices( form = choice ~ cost | income | time, N = 100, T = 10, J = 2, re = c("cost", "time"), alternatives = c("car", "bus"), seed = 1, true_parameter = list( "alpha" = c(-1, 1), "b" = matrix(c(-1, -1, -0.5, -1.5, 0, -1), ncol = 2), "C" = 2 ) ) ### simulate data from an ordered probit model data <- simulate_choices( form = opinion ~ age + gender, N = 10, T = 1:10, J = 5, alternatives = c("very bad", "bad", "indifferent", "good", "very good"), ordered = TRUE, covariates = list( "gender" = rep(sample(c(0, 1), 10, replace = TRUE), times = 1:10) ), seed = 1 ) ### simulate data from a ranked probit model data <- simulate_choices( form = product ~ price, N = 10, T = 1:10, J = 3, alternatives = c("A", "B", "C"), ranked = TRUE, seed = 1 )
### simulate data from a binary probit model with two latent classes data <- simulate_choices( form = choice ~ cost | income | time, N = 100, T = 10, J = 2, re = c("cost", "time"), alternatives = c("car", "bus"), seed = 1, true_parameter = list( "alpha" = c(-1, 1), "b" = matrix(c(-1, -1, -0.5, -1.5, 0, -1), ncol = 2), "C" = 2 ) ) ### simulate data from an ordered probit model data <- simulate_choices( form = opinion ~ age + gender, N = 10, T = 1:10, J = 5, alternatives = c("very bad", "bad", "indifferent", "good", "very good"), ordered = TRUE, covariates = list( "gender" = rep(sample(c(0, 1), 10, replace = TRUE), times = 1:10) ), seed = 1 ) ### simulate data from a ranked probit model data <- simulate_choices( form = product ~ price, N = 10, T = 1:10, J = 3, alternatives = c("A", "B", "C"), ranked = TRUE, seed = 1 )
Data set of 2929 stated choices by 235 Dutch individuals deciding between
two virtual train trip options "A"
and "B"
based on the price,
the travel time, the number of rail-to-rail transfers (changes), and the
level of comfort.
The data were obtained in 1987 by Hague Consulting Group for the National Dutch Railways. Prices were recorded in Dutch guilder and in this data set transformed to Euro at an exchange rate of 2.20371 guilders = 1 Euro.
train_choice
train_choice
A data.frame
with 2929 rows and 11 columns:
integer
identifier for the decider
integer
identifier for the choice occasion
character
for the chosen alternative (either "A"
or "B"
)
numeric
price for alternative "A"
in Euro
numeric
travel time for alternative "A"
in hours
integer
number of changes for alternative "A"
integer
comfort level (in decreasing comfort order) for alternative "A"
numeric
price for alternative "B"
in Euro
numeric
travel time for alternative "B"
in hours
integer
number of changes for alternative "B"
integer
comfort level (in decreasing comfort order) for alternative "B"
Ben-Akiva M, Bolduc D, Bradley M (1993). “Estimation of travel choice models with randomly distributed values of time.” Transportation Research Record, 1413, 88–97.
Meijer E, Rouwendal J (2006). “Measuring welfare effects in models with random coefficients.” Journal of Applied Econometrics, 21(2), 227–244.
This function splits choice data into a train and a test part.
train_test( x, test_proportion = NULL, test_number = NULL, by = "N", random = FALSE, seed = NULL )
train_test( x, test_proportion = NULL, test_number = NULL, by = "N", random = FALSE, seed = NULL )
x |
An object of class |
test_proportion |
A number between 0 and 1, the proportion of the test subsample. |
test_number |
A positive integer, the number of observations in the test subsample. |
by |
One of |
random |
If |
seed |
Set a seed for building the subsamples randomly. |
See the vignette on choice data for more details.
A list with two objects of class RprobitB_data
, named "train"
and "test"
.
### simulate choices for demonstration x <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2) ### 70% of deciders in the train subsample, ### 30% of deciders in the test subsample train_test(x, test_proportion = 0.3, by = "N") ### 2 randomly chosen choice occasions per decider in the test subsample, ### the rest in the train subsample train_test(x, test_number = 2, by = "T", random = TRUE, seed = 1)
### simulate choices for demonstration x <- simulate_choices(form = choice ~ covariate, N = 10, T = 10, J = 2) ### 70% of deciders in the train subsample, ### 30% of deciders in the test subsample train_test(x, test_proportion = 0.3, by = "N") ### 2 randomly chosen choice occasions per decider in the test subsample, ### the rest in the train subsample train_test(x, test_number = 2, by = "T", random = TRUE, seed = 1)
Given an object of class RprobitB_fit
, this function can:
change the length B
of the burn-in period,
change the the thinning factor Q
of the Gibbs samples,
change the utility scale
.
## S3 method for class 'RprobitB_fit' transform( `_data`, B = NULL, Q = NULL, scale = NULL, check_preference_flip = TRUE, ... )
## S3 method for class 'RprobitB_fit' transform( `_data`, B = NULL, Q = NULL, scale = NULL, check_preference_flip = TRUE, ... )
_data |
An object of class |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Q |
The thinning factor for the Gibbs samples, i.e. only every |
scale |
A character which determines the utility scale. It is of the form
|
check_preference_flip |
Set to |
... |
Ignored. |
See the vignette "Model fitting" for more details:
vignette("v03_model_fitting", package = "RprobitB")
.
The transformed RprobitB_fit
object.
This function estimates a nested probit model based on a given
RprobitB_fit
object.
## S3 method for class 'RprobitB_fit' update( object, form, re, alternatives, id, idc, standardize, impute, scale, R, B, Q, print_progress, prior, latent_classes, seed, ... )
## S3 method for class 'RprobitB_fit' update( object, form, re, alternatives, id, idc, standardize, impute, scale, R, B, Q, print_progress, prior, latent_classes, seed, ... )
object |
An object of class |
form |
A
Multiple covariates (of one type) are separated by a In the ordered probit model ( |
re |
A character (vector) of covariates of |
alternatives |
A character vector with the names of the choice alternatives.
If not specified, the choice set is defined by the observed choices.
If |
id |
A character, the name of the column in |
idc |
A character, the name of the column in |
standardize |
A character vector of names of covariates that get standardized.
Covariates of type 1 or 3 have to be addressed by
|
impute |
A character that specifies how to handle missing covariate entries in
|
scale |
A character which determines the utility scale. It is of the form
|
R |
The number of iterations of the Gibbs sampler. |
B |
The length of the burn-in period, i.e. a non-negative number of samples to be discarded. |
Q |
The thinning factor for the Gibbs samples, i.e. only every |
print_progress |
A boolean, determining whether to print the Gibbs sampler progress and the estimated remaining computation time. |
prior |
A named list of parameters for the prior distributions. See the documentation
of |
latent_classes |
Either
|
seed |
Set a seed for the Gibbs sampling. |
... |
Ignored. |
All parameters (except for object
) are optional and if not specified
retrieved from the specification for object
.
An object of class RprobitB_fit
.