Title: | Estimates OIPP and OIZTNB Regression Models |
---|---|
Description: | Estimates one-inflated positive Poisson (OIPP) and one-inflated zero-truncated negative binomial (OIZTNB) regression models. A suite of ancillary statistical tools are also provided, including: estimation of positive Poisson (PP) and zero-truncated negative binomial (ZTNB) models; marginal effects and their standard errors; diagnostic likelihood ratio and Wald tests; plotting; predicted counts and expected responses; and random variate generation. The models and tools, as well as four applications, are shown in Godwin, R. T. (2024). "One-inflated zero-truncated count regression models" arXiv preprint <arXiv:2402.02272>. |
Authors: | Ryan T. Godwin [aut, cre] |
Maintainer: | Ryan T. Godwin <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2025-03-09 05:41:07 UTC |
Source: | https://github.com/rtgodwin/oneinfl |
This wrapper function calls a different function to calculate marginal effects depending on the model type. The marginal effects of the variables are evaluated at specified points, such as the sample means or averages, or at custom-defined cases.
margins(model, df, at = "AE")
margins(model, df, at = "AE")
model |
An object representing a fitted model. Must be of class |
df |
A |
at |
A character string or list specifying where to evaluate the marginal effects:
|
The function computes marginal effects for zero-truncated Poisson or negative binomial regression models.
It handles different model types; oneinflmodel
for one-inflated models, and truncmodel
for standard count models.
The marginal effects are evaluated at either all data points and averaged (AE
, the default), at the sample means of the variables (EM
), or at a custom case.
The marginal effects for dummy variables are actually the differences in expected outcomes for values of the dummy of 1 and 0.
The marginal effects are displayed along with their statistical significance, evaluated based on the chosen at
parameter.
Prints the marginal effects, their standard errors, z-values, p-values, and significance levels.
dEdq_nb
, dEdq_nb_noinfl
, dEdq_pois
,
dEdq_pois_noinfl
, model.frame
,
model.matrix
, numericDeriv
df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") margins(model, df, at = "AE") # Average Effect margins(model, df, at = "EM") # Effect at Means margins(model, df, at = list(x = 1, z = 0)) # Custom case
df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") margins(model, df, at = "AE") # Average Effect margins(model, df, at = "EM") # Effect at Means margins(model, df, at = list(x = 1, z = 0)) # Custom case
Fits a one-inflated positive Poisson (OIPP) or one-inflated zero-truncated negative binomial (OIZTNB) regression model.
oneinfl(formula, df, dist = "negbin", start = NULL, method = "BFGS")
oneinfl(formula, df, dist = "negbin", start = NULL, method = "BFGS")
formula |
A symbolic description of the model to be fitted. Variables before the pipe |
df |
A data frame containing the variables in the model. |
dist |
A character string specifying the distribution to use. Options are |
start |
Optional. A numeric vector of starting values for the optimization process. Defaults to |
method |
A character string specifying the optimization method to be passed to |
This function fits a regression model for one-inflated counts. One-inflated models are used when there are an excess number of ones, relative to a Poisson or negative binomial process.
The function supports two distributions:
"Poisson"
: One-inflated Poisson regression.
"negbin"
: One-inflated negative binomial regression.
The function uses numerical optimization via optim
to estimate the parameters.
An object of class "oneinflmodel"
containing the following components:
beta
Estimated coefficients for the rate component of the model.
gamma
Estimated coefficients for the one-inflation component of the model.
alpha
Dispersion parameter (only for negative binomial distribution).
vc
Variance-covariance matrix of the estimated parameters.
logl
Log-likelihood of the fitted model.
avgw
Average one-inflation probability.
absw
Mean absolute one-inflation probability.
dist
The distribution used for the model ("Poisson" or "negbin").
formula
The formula used for the model.
summary
for summarizing the fitted model.
margins
for calculating the marginal effects of regressors.
oneWald
to test for no one-inflation.
signifWald
for testing the joint significance of a single regressor that appears before and after the pipe |
.
oneplot
for plotting actual and predicted counts.
predict
for expected response/dependent variable at each observation.
truncreg
for fitting positive Poisson (PP) and zero-truncated negative binomial (ZTNB) models.
oneLRT
to test for no one-inflation or no overdispersion using a nested PP, OIPP, or ZTNB model.
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 1) + 1) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") summary(model) margins(model, df) oneWald(model) predict(model, df)
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 1) + 1) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") summary(model) margins(model, df) oneWald(model) predict(model, df)
Performs a likelihood ratio test (LRT) to compare two nested models estimated by
oneinfl
or truncreg
. It calculates the LRT statistic
and its associated p-value, testing whether the more complex model provides
a significantly better fit to the data than the simpler model.
oneLRT(mod0, mod1)
oneLRT(mod0, mod1)
mod0 |
A model object (typically the simpler model) estimated using
|
mod1 |
A model object (typically the more complex model) estimated using
|
The function extracts the log-likelihoods and number of parameters from the two models. It then calculates the LRT statistic:
where and
are the log-likelihoods of the simpler and
more complex models, respectively. The degrees of freedom for the test are
equal to the difference in the number of parameters between the models.
The likelihood ratio test is commonly used to test for:
Overdispersion: Comparing a Poisson model to a negative binomial model.
One-inflation: Comparing a one-inflated model to a non-one-inflated model.
A list with the following components:
LRTstat
The likelihood ratio test statistic.
pval
The p-value associated with the test statistic, based on a chi-squared distribution.
oneinfl
for fitting one-inflated models.
truncreg
for fitting zero-truncated models.
pchisq
for the chi-squared distribution.
# Example: One-inflation test df <- data.frame(y = rpois(100, lambda = 5), x = rnorm(100), z = rnorm(100)) OIZTNB <- oneinfl(y ~ x | z, df = df, dist = "negbin") ZTNB <- truncreg(y ~ x, df = df, dist = "negbin") oneLRT(OIZTNB, ZTNB) # Example: Overdispersion test OIPP <- oneinfl(y ~ x | z, df = df, dist = "Poisson") oneLRT(OIZTNB, OIPP)
# Example: One-inflation test df <- data.frame(y = rpois(100, lambda = 5), x = rnorm(100), z = rnorm(100)) OIZTNB <- oneinfl(y ~ x | z, df = df, dist = "negbin") ZTNB <- truncreg(y ~ x, df = df, dist = "negbin") oneLRT(OIZTNB, ZTNB) # Example: Overdispersion test OIPP <- oneinfl(y ~ x | z, df = df, dist = "Poisson") oneLRT(OIZTNB, OIPP)
Generates a bar plot of observed count data and overlays predicted values
from one or more models fitted using oneinfl
or truncreg
.
oneplot(model1, model2, model3, model4, df, maxpred, ylimit, ccex)
oneplot(model1, model2, model3, model4, df, maxpred, ylimit, ccex)
model1 |
The first fitted model object, either a one-inflated model (class |
model2 |
Optional. A second fitted model object, structured similarly to |
model3 |
Optional. A third fitted model object, structured similarly to |
model4 |
Optional. A fourth fitted model object, structured similarly to |
df |
A data frame containing the variables used in the models. |
maxpred |
Optional. The maximum count value to include in the plot. Defaults to the maximum observed count. |
ylimit |
Optional. The upper limit for the y-axis. Defaults to 1.1 times the highest observed frequency. |
ccex |
Optional. A numeric value controlling the size of plot points and lines. Defaults to |
This function visualizes observed count data as a bar plot and overlays predicted
values from up to four models. The function automatically detects the type of model
(Poisson or negative binomial; one-inflated or truncated) and adjusts the plot
accordingly. Predictions are generated using the pred
function.
Model types are distinguished by different point and line styles:
Poisson (PP): Dark magenta, triangle-down
Zero-truncated negative binomial (ZTNB): Red, diamond
One-inflated Poisson (OIPP): Green, triangle-up
One-inflated zero-truncated negative binomial (OIZTNB): Blue, circle
The legend in the top-right corner of the plot indicates the models displayed.
A plot is generated but no values are returned.
oneinfl
for fitting one-inflated models.
truncreg
for fitting truncated models.
pred
for generating predictions used in the plot.
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5) + 1) model1 <- oneinfl(y ~ x | z, df = df, dist = "Poisson") model2 <- truncreg(y ~ x, df = df, dist = "negbin") oneplot(model1, model2, df = df, maxpred = 10)
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5) + 1) model1 <- oneinfl(y ~ x | z, df = df, dist = "Poisson") model2 <- truncreg(y ~ x, df = df, dist = "negbin") oneplot(model1, model2, df = df, maxpred = 10)
Performs a Wald test to evaluate the significance of the one-inflation parameters
in a model estimated using oneinfl
.
oneWald(model)
oneWald(model)
model |
A model object of class |
The Wald test evaluates the null hypothesis that all one-inflation parameters
(gamma
) are equal to zero, indicating no one-inflation. The test statistic
is calculated as:
where is the vector of one-inflation parameters and
is their
variance-covariance matrix. The p-value is computed using a chi-squared distribution
with degrees of freedom equal to the length of
.
This test is commonly used to determine whether a one-inflated model provides a significantly better fit than a non-one-inflated counterpart.
A list with the following components:
W
The Wald test statistic.
pval
The p-value associated with the test statistic, based on a chi-squared distribution.
oneinfl
for fitting one-inflated models.
oneLRT
for a likelihood ratio test of nested models.
pchisq
for the chi-squared distribution.
# Example usage df <- data.frame(y = rpois(100, lambda = 5), x = rnorm(100), z = rnorm(100)) OIZTNB <- oneinfl(y ~ x | z, df = df, dist = "negbin") oneWald(OIZTNB)
# Example usage df <- data.frame(y = rpois(100, lambda = 5), x = rnorm(100), z = rnorm(100)) OIZTNB <- oneinfl(y ~ x | z, df = df, dist = "negbin") oneWald(OIZTNB)
Calculates the predicted expected response for a model fitted using
oneinfl
or truncreg
.
## S3 method for class 'oneinflmodel' predict(model, df, type = "response")
## S3 method for class 'oneinflmodel' predict(model, df, type = "response")
model |
A fitted model object of class |
df |
A data frame containing the predictor variables used in the model. |
type |
A character string specifying the type of prediction. Currently, only
|
This function computes the expected response based on the fitted model. The computation
differs depending on the distribution. For Poisson (OIPP)
, predicted values are
computed using E_pois
. For Negative Binomial (OIZTNB)
, predicted
values are computed using E_negbin
.
A numeric vector of predicted expected responses for the observations in df
.
oneinfl
for fitting one-inflated models.
E_pois
, E_negbin
, for the expected value calculations.
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") predict(model, df = df, type = "response")
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") predict(model, df = df, type = "response")
Calculates the predicted expected response for a model fitted using
oneinfl
or truncreg
.
## S3 method for class 'truncmodel' predict(model, df, type = "response")
## S3 method for class 'truncmodel' predict(model, df, type = "response")
model |
A fitted model object of class |
df |
A data frame containing the predictor variables used in the model. |
type |
A character string specifying the type of prediction. Currently, only
|
This function computes the expected response based on the fitted model. The computation
differs depending on the distribution. For Poisson (PP)
, predicted values are computed
using E_pois_noinfl
. For Negative Binomial (ZTNB)
, predicted values are
computed using E_negbin_noinfl
.
A numeric vector of predicted expected responses for the observations in df
.
oneinfl
for fitting one-inflated models.
truncreg
for fitting truncated models.
E_pois_noinfl
, E_negbin_noinfl
for the expected value calculations.
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 5)) model <- truncreg(y ~ x, df = df, dist = "Poisson") predict(model, df = df, type = "response")
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 5)) model <- truncreg(y ~ x, df = df, dist = "Poisson") predict(model, df = df, type = "response")
Simulates count data from a one-inflated Poisson process using specified parameters for the rate and one-inflation components.
roipp(b, g, X, Z)
roipp(b, g, X, Z)
b |
A numeric vector of coefficients for the rate component. |
g |
A numeric vector of coefficients for the one-inflation component. |
X |
A matrix or data frame of predictor variables for the rate component. |
Z |
A matrix or data frame of predictor variables for the one-inflation component. |
This function generates count data from a one-inflated Poisson process. The process combines:
A Poisson distribution for counts greater than one.
A one-inflation component that adjusts the probability of observing a count of one.
The algorithm:
Calculates the rate parameter () as
.
Computes the one-inflation probabilities () based on
.
Simulates counts for each observation:
Draws a random number to determine whether the count is one.
Iteratively calculates probabilities for higher counts until the random number is matched.
This function is useful for generating synthetic data for testing or simulation studies involving one-inflated Poisson models.
A numeric vector of simulated count data.
oneinfl
for fitting one-inflated models.
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) Z <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) g <- c(1.0, 0.3) simulated_data <- roipp(b, g, X, Z) print(simulated_data)
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) Z <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) g <- c(1.0, 0.3) simulated_data <- roipp(b, g, X, Z) print(simulated_data)
Simulates count data from a one-inflated, zero-truncated negative binomial (OIZTNB) process using specified parameters for the rate, one-inflation, and dispersion components.
roiztnb(b, g, alpha, X, Z)
roiztnb(b, g, alpha, X, Z)
b |
A numeric vector of coefficients for the rate component. |
g |
A numeric vector of coefficients for the one-inflation component. |
alpha |
A numeric value representing the dispersion parameter for the negative binomial distribution. |
X |
A matrix or data frame of predictor variables for the rate component. |
Z |
A matrix or data frame of predictor variables for the one-inflation component. |
This function generates count data from a one-inflated, zero-truncated negative binomial process. The process combines:
A negative binomial distribution for counts greater than one.
A one-inflation component that adjusts the probability of observing a count of one.
The algorithm:
Calculates the rate parameter () as
.
Computes the one-inflation probabilities () based on
.
Computes the negative binomial dispersion parameter ().
Simulates counts for each observation:
Draws a random number to determine whether the count is one.
Iteratively calculates probabilities for higher counts until the random number is matched.
This function is useful for generating synthetic data for testing or simulation studies involving one-inflated, zero-truncated negative binomial models.
A numeric vector of simulated count data.
oneinfl
for fitting one-inflated models.
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) Z <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) g <- c(1.0, 0.3) alpha <- 1.5 simulated_data <- roiztnb(b, g, alpha, X, Z) print(simulated_data)
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) Z <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) g <- c(1.0, 0.3) alpha <- 1.5 simulated_data <- roiztnb(b, g, alpha, X, Z) print(simulated_data)
Simulates count data from a zero-truncated Poisson process using specified parameters for the rate component.
rpp(b, X)
rpp(b, X)
b |
A numeric vector of coefficients for the rate component. |
X |
A matrix or data frame of predictor variables for the rate component. |
This function generates count data from a zero-truncated Poisson process, which models count data without zeros. The process involves:
Calculating the rate parameter () as
.
Iteratively computing probabilities for counts starting from 1 and adding to the cumulative probability until a randomly drawn value is matched.
This function is useful for generating synthetic data for testing or simulation studies involving zero-truncated Poisson models.
A numeric vector of simulated count data.
truncreg
for fitting zero-truncated models.
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) simulated_data <- rpp(b, X) print(simulated_data)
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) simulated_data <- rpp(b, X) print(simulated_data)
Performs a Wald test to evaluate the joint significance of a predictor variable in both the rate and one-inflation components of a model.
signifWald(model, varname)
signifWald(model, varname)
model |
A fitted model object of class |
varname |
A character string specifying the name of the predictor variable to test. |
This function tests the null hypothesis that the coefficients for the specified predictor
variable are jointly equal to zero in both the rate (beta
) and one-inflation
(gamma
) components of the model. The test statistic is calculated as:
where is the vector of coefficients for the predictor in the rate and
one-inflation components, and
is their variance-covariance matrix. The p-value is
computed using a chi-squared distribution with 2 degrees of freedom.
A list with the following components:
W
The Wald test statistic.
pval
The p-value associated with the test statistic, based on a chi-squared distribution with 2 degrees of freedom.
oneinfl
for fitting one-inflated models.
oneWald
for a general Wald test of one-inflation parameters.
# Example usage set.seed(123) df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") result <- signifWald(model, varname = "x") print(result$W) # Wald test statistic print(result$pval) # p-value
# Example usage set.seed(123) df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") result <- signifWald(model, varname = "x") print(result$W) # Wald test statistic print(result$pval) # p-value
Provides a summary of the fitted model, including estimated coefficients, standard errors, significance levels, and other relevant statistics.
## S3 method for class 'oneinflmodel' summary(object, ...)
## S3 method for class 'oneinflmodel' summary(object, ...)
object |
A model object of class |
... |
Additional arguments (currently unused). |
This function generates a detailed summary of the fitted model, including:
Estimated coefficients for the rate component (beta
).
Estimated coefficients for the one-inflation component (gamma
).
Standard errors.
z-statistics, associated p-values, and corresponding significance codes.
Average, and average absolute, one-inflation.
Log-likelihood of the fitted model.
Prints a summary table of coefficients, standard errors, z-values, p-values, significance codes, one-inflation probabilities, and log-likelihood.
oneinfl
for fitting one-inflated models.
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") summary(model)
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") summary(model)
Provides a summary of the fitted model, including estimated coefficients, standard errors, significance levels, and other relevant statistics.
## S3 method for class 'truncmodel' summary(object, ...)
## S3 method for class 'truncmodel' summary(object, ...)
object |
A model object of class |
... |
Additional arguments (currently unused). |
This function generates a detailed summary of the fitted model, including:
Estimated coefficients for the rate component (beta
).
Standard errors.
z-statistics, associated p-values, and corresponding significance codes.
Log-likelihood of the fitted model.
Prints a summary table of coefficients, standard errors, z-values, p-values, significance codes, and log-likelihood.
truncreg
for fitting truncated regression models.
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 5)) model <- truncreg(y ~ x, df = df, dist = "Poisson") summary(model)
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 5)) model <- truncreg(y ~ x, df = df, dist = "Poisson") summary(model)
Fits a positive Poisson (PP) or zero-truncated negative binomial (ZTNB) regression model.
truncreg(formula, df, dist = "negbin", start = NULL, method = "BFGS")
truncreg(formula, df, dist = "negbin", start = NULL, method = "BFGS")
formula |
A symbolic description of the model to be fitted. |
df |
A data frame containing the variables in the model. |
dist |
A character string specifying the distribution to use. Options are |
start |
Optional. A numeric vector of starting values for the optimization process. Defaults to |
method |
A character string specifying the optimization method to be passed to |
This function fits a regression model for zero-truncated counts. Zero-truncated models are used when the count data does not include zeros, such as in cases where only positive counts are observed.
The function supports two distributions:
"Poisson"
: Zero-truncated Poisson regression.
"negbin"
: Zero-truncated negative binomial regression.
The function uses numerical optimization via optim
to estimate the parameters.
An object of class "truncmodel"
containing the following components:
beta
Estimated coefficients for the regression model.
alpha
Dispersion parameter (only for negative binomial distribution).
vc
Variance-covariance matrix of the estimated parameters.
logl
Log-likelihood of the fitted model.
dist
The distribution used for the model ("Poisson" or "negbin").
formula
The formula used for the model.
summary
for summarizing the fitted model.
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 1) + 1) model <- truncreg(y ~ x, df = df, dist = "Poisson") summary(model)
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 1) + 1) model <- truncreg(y ~ x, df = df, dist = "Poisson") summary(model)