| Title: | Estimates OIPP and OIZTNB Regression Models |
|---|---|
| Description: | Estimates one-inflated positive Poisson (OIPP) and one-inflated zero-truncated negative binomial (OIZTNB) regression models. A suite of ancillary statistical tools are also provided, including: estimation of positive Poisson (PP) and zero-truncated negative binomial (ZTNB) models; marginal effects and their standard errors; diagnostic likelihood ratio and Wald tests; plotting; predicted counts and expected responses; and random variate generation. The models and tools, as well as four applications, are shown in Godwin, R. T. (2024). "One-inflated zero-truncated count regression models" arXiv preprint <doi:10.48550/arXiv.2402.02272>. |
| Authors: | Ryan T. Godwin [aut, cre] |
| Maintainer: | Ryan T. Godwin <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.0.2 |
| Built: | 2026-05-16 07:20:57 UTC |
| Source: | https://github.com/rtgodwin/oneinfl |
Processes a model formula and a data frame to generate design matrices ('X' and 'Z') and a response vector ('y') for regression models, including support for complex formulas with '|' operators.
makeXZy(formula, df)makeXZy(formula, df)
formula |
A symbolic description of the model, where the left-hand side specifies the response variable and the right-hand side specifies predictors. Formulas can include a '|' operator to separate predictors for different components of a model. |
df |
A data frame containing the variables specified in the formula. |
This function processes the formula to extract and construct: - 'X': The main design matrix. - 'Z': A secondary design matrix (if a '|' operator is used in the formula, separating components). - 'y': The response variable.
It handles cases where the formula specifies:
- Only the main component (e.g., y ~ x1 + x2).
- A secondary component using the '|' operator (e.g., y ~ x1 + x2 | z1 + z2).
A list containing the following components:
XA design matrix for the main predictors.
ZA design matrix for additional predictors (e.g., for a secondary process in a two-component model).
yThe response vector extracted from the formula.
model.matrix, model.frame, model.response
This wrapper function calls a different function to calculate marginal effects depending on the model type. The marginal effects of the variables are evaluated at specified points, such as the sample means or averages, or at custom-defined cases.
margins(model, df, at = "AE", verbose = TRUE)margins(model, df, at = "AE", verbose = TRUE)
model |
An object representing a fitted model. Must be of class 'oneinflmodel' or 'truncmodel'. |
df |
A |
at |
A character string or list specifying where to evaluate the marginal effects:
|
verbose |
Logical; if |
The function computes marginal effects for zero-truncated Poisson or negative binomial regression models. It handles different model types; 'oneinflmodel' for one-inflated models, and 'truncmodel' for standard count models. The marginal effects are evaluated at either all data points and averaged ('AE', the default), at the sample means of the variables ('EM'), or at a custom case. The marginal effects for dummy variables are actually the differences in expected outcomes for values of the dummy of 1 and 0. The marginal effects are displayed along with their statistical significance, evaluated based on the chosen 'at' parameter.
If verbose=TRUE (default), prints the marginal effects, their standard errors, z-values, p-values, and significance levels.
If verbose=FALSE, returns a list containing the following components:
whereA description of how the marginal effects have been evaluated.
dEdqThe marginal effect. The partial derivative of the expected count with respect to a variable q in the X and or Z matrix, or the difference in expectation if q is binary.
seThe standard errors of the marginal effects evaluated numerically and using a Jacobian via the delta method.
dEdq_nb, dEdq_nb_noinfl, dEdq_pois,
dEdq_pois_noinfl, model.frame,
model.matrix, numericDeriv
df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") margins(model, df, at = "AE") # Average Effect margins(model, df, at = "EM", verbose=FALSE) # Effect at Means, suppress printing margins(model, df, at = list(x = 1, z = 0)) # Custom casedf <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") margins(model, df, at = "AE") # Average Effect margins(model, df, at = "EM", verbose=FALSE) # Effect at Means, suppress printing margins(model, df, at = list(x = 1, z = 0)) # Custom case
Fits a one-inflated positive Poisson (OIPP) or one-inflated zero-truncated negative binomial (OIZTNB) regression model.
oneinfl(formula, df, dist = "negbin", start = NULL, method = "BFGS")oneinfl(formula, df, dist = "negbin", start = NULL, method = "BFGS")
formula |
A symbolic description of the model to be fitted. Variables before the pipe '|' link to the usual Poisson rate parameter, after the pipe link to the one-inflation parameter. |
df |
A data frame containing the variables in the model. |
dist |
A character string specifying the distribution to use. Options are '"Poisson"' or '"negbin"'. |
start |
Optional. A numeric vector of starting values for the optimization process. Defaults to 'NULL', in which case starting values are attempted to be chosen automatically. |
method |
A character string specifying the optimization method to be passed to |
This function fits a regression model for one-inflated counts. One-inflated models are used when there are an excess number of ones, relative to a Poisson or negative binomial process.
The function supports two distributions: - '"Poisson"': One-inflated Poisson regression. - '"negbin"': One-inflated negative binomial regression.
The function uses numerical optimization via optim to estimate the parameters.
An object of class '"oneinflmodel"' containing the following components:
betaEstimated coefficients for the rate component of the model.
gammaEstimated coefficients for the one-inflation component of the model.
alphaDispersion parameter (only for negative binomial distribution).
vcVariance-covariance matrix of the estimated parameters.
loglLog-likelihood of the fitted model.
avgwAverage one-inflation probability.
abswMean absolute one-inflation probability.
distThe distribution used for the model ("Poisson" or "negbin").
formulaThe formula used for the model.
summary for summarizing the fitted model.
margins for calculating the marginal effects of regressors.
oneWald to test for no one-inflation.
signifWald for testing the joint significance of a single regressor that appears before and after the pipe '|'.
oneplot for plotting actual and predicted counts.
predict for expected response/dependent variable at each observation.
truncreg for fitting positive Poisson (PP) and zero-truncated negative binomial (ZTNB) models.
oneLRT to test for no one-inflation or no overdispersion using a nested PP, OIPP, or ZTNB model.
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 1) + 1) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") summary(model) margins(model, df) oneWald(model) predict(model, df=df)# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 1) + 1) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") summary(model) margins(model, df) oneWald(model) predict(model, df=df)
Performs a likelihood ratio test (LRT) to compare two nested models estimated by
oneinfl or truncreg. It calculates the LRT statistic
and its associated p-value, testing whether the more complex model provides
a significantly better fit to the data than the simpler model.
oneLRT(mod0, mod1)oneLRT(mod0, mod1)
mod0 |
A model object (typically the simpler model) estimated using
|
mod1 |
A model object (typically the more complex model) estimated using
|
The function extracts the log-likelihoods and number of parameters from the two models. It then calculates the LRT statistic:
where and are the log-likelihoods of the simpler and
more complex models, respectively. The degrees of freedom for the test are
equal to the difference in the number of parameters between the models.
The likelihood ratio test is commonly used to test for: - Overdispersion: Comparing a Poisson model to a negative binomial model. - One-inflation: Comparing a one-inflated model to a non-one-inflated model.
A list with the following components:
LRTstatThe likelihood ratio test statistic.
pvalThe p-value associated with the test statistic, based on a chi-squared distribution.
oneinfl for fitting one-inflated models.
truncreg for fitting zero-truncated models.
pchisq for the chi-squared distribution.
# Example: One-inflation test df <- data.frame(y = rpois(100, lambda = 5), x = rnorm(100), z = rnorm(100)) OIZTNB <- oneinfl(y ~ x | z, df = df, dist = "negbin") ZTNB <- truncreg(y ~ x, df = df, dist = "negbin") oneLRT(OIZTNB, ZTNB) # Example: Overdispersion test OIPP <- oneinfl(y ~ x | z, df = df, dist = "Poisson") oneLRT(OIZTNB, OIPP)# Example: One-inflation test df <- data.frame(y = rpois(100, lambda = 5), x = rnorm(100), z = rnorm(100)) OIZTNB <- oneinfl(y ~ x | z, df = df, dist = "negbin") ZTNB <- truncreg(y ~ x, df = df, dist = "negbin") oneLRT(OIZTNB, ZTNB) # Example: Overdispersion test OIPP <- oneinfl(y ~ x | z, df = df, dist = "Poisson") oneLRT(OIZTNB, OIPP)
Generates a bar plot of observed count data and overlays predicted values
from one or more models fitted using oneinfl or truncreg.
oneplot( model1, model2 = NULL, model3 = NULL, model4 = NULL, df, maxpred = NULL, ylimit = NULL, ccex = 1.5 )oneplot( model1, model2 = NULL, model3 = NULL, model4 = NULL, df, maxpred = NULL, ylimit = NULL, ccex = 1.5 )
model1 |
The first fitted model object, either a one-inflated model (class '"oneinflmodel"') or a truncated model (class '"truncmodel"'). |
model2 |
Optional. A second fitted model object, structured similarly to |
model3 |
Optional. A third fitted model object. |
model4 |
Optional. A fourth fitted model object. |
df |
A data frame containing the variables used in the models. |
maxpred |
Optional. The maximum count value to include in the plot. Defaults to the maximum observed count. |
ylimit |
Optional. The upper limit for the y-axis. Defaults to 1.1 times the highest observed frequency. |
ccex |
Optional. A numeric value controlling the size of plot points and lines. Defaults to |
This function visualizes observed count data as a bar plot and overlays predicted
values from up to four models. The function automatically detects the type of model
(Poisson or negative binomial; one-inflated or truncated) and adjusts the plot accordingly.
Predictions are generated using the pred function.
Model types are distinguished by different point and line styles:
Poisson (PP): Dark magenta, triangle-down
Zero-truncated negative binomial (ZTNB): Red, diamond
One-inflated Poisson (OIPP): Green, triangle-up
One-inflated zero-truncated negative binomial (OIZTNB): Blue, circle
The legend in the top-right corner of the plot indicates the models displayed.
A plot is generated but no values are returned.
oneinfl for fitting one-inflated models.
truncreg for fitting truncated models.
pred for generating predictions used in the plot.
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5) + 1) model1 <- oneinfl(y ~ x | z, df = df, dist = "Poisson") model2 <- truncreg(y ~ x, df = df, dist = "Poisson") oneplot(model1, model2, df = df, maxpred = 10)# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5) + 1) model1 <- oneinfl(y ~ x | z, df = df, dist = "Poisson") model2 <- truncreg(y ~ x, df = df, dist = "Poisson") oneplot(model1, model2, df = df, maxpred = 10)
Performs a Wald test to evaluate the significance of the one-inflation parameters
in a model estimated using oneinfl.
oneWald(model)oneWald(model)
model |
A model object of class |
The Wald test evaluates the null hypothesis that all one-inflation parameters
(gamma) are equal to zero, indicating no one-inflation. The test statistic
is calculated as:
where is the vector of one-inflation parameters and is their
variance-covariance matrix. The p-value is computed using a chi-squared distribution
with degrees of freedom equal to the length of .
This test is commonly used to determine whether a one-inflated model provides a significantly better fit than a non-one-inflated counterpart.
A list with the following components:
WThe Wald test statistic.
pvalThe p-value associated with the test statistic, based on a chi-squared distribution.
oneinfl for fitting one-inflated models.
oneLRT for a likelihood ratio test of nested models.
pchisq for the chi-squared distribution.
# Example usage df <- data.frame(y = rpois(100, lambda = 5), x = rnorm(100), z = rnorm(100)) OIZTNB <- oneinfl(y ~ x | z, df = df, dist = "negbin") oneWald(OIZTNB)# Example usage df <- data.frame(y = rpois(100, lambda = 5), x = rnorm(100), z = rnorm(100)) OIZTNB <- oneinfl(y ~ x | z, df = df, dist = "negbin") oneWald(OIZTNB)
Calculates the predicted expected response for a model fitted using
oneinfl or truncreg.
## S3 method for class 'oneinflmodel' predict(object, ...)## S3 method for class 'oneinflmodel' predict(object, ...)
object |
An object of class 'oneinflmodel' |
... |
Additional argument 'df', a data frame used to calculate the expected value of the response variable. |
This function computes the expected response based on the fitted model. The computation
differs depending on the distribution. For Poisson (OIPP), predicted values are
computed using E_pois. For Negative Binomial (OIZTNB), predicted
values are computed using E_negbin.
A numeric vector of predicted expected responses for the observations in df.
oneinfl for fitting one-inflated models.
E_pois, E_negbin, for the expected value calculations.
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") predict(model, df = df)# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") predict(model, df = df)
Calculates the predicted expected response for a model fitted using
oneinfl or truncreg.
## S3 method for class 'truncmodel' predict(object, ...)## S3 method for class 'truncmodel' predict(object, ...)
object |
An object of class 'truncmodel' |
... |
Additional argument 'df', a data frame used to calculate the expected value of the response variable. |
This function computes the expected response based on the fitted model. The computation
differs depending on the distribution. For Poisson (PP), predicted values are computed
using E_pois_noinfl. For Negative Binomial (ZTNB), predicted values are
computed using E_negbin_noinfl.
A numeric vector of predicted expected responses for the observations in df.
oneinfl for fitting one-inflated models.
truncreg for fitting truncated models.
E_pois_noinfl, E_negbin_noinfl for the expected value calculations.
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 5)) model <- truncreg(y ~ x, df = df, dist = "Poisson") predict(model, df = df)# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 5)) model <- truncreg(y ~ x, df = df, dist = "Poisson") predict(model, df = df)
Simulates count data from a one-inflated Poisson process using specified parameters for the rate and one-inflation components.
roipp(b, g, X, Z)roipp(b, g, X, Z)
b |
A numeric vector of coefficients for the rate component. |
g |
A numeric vector of coefficients for the one-inflation component. |
X |
A matrix or data frame of predictor variables for the rate component. |
Z |
A matrix or data frame of predictor variables for the one-inflation component. |
This function generates count data from a one-inflated Poisson process. The process combines:
A Poisson distribution for counts greater than one.
A one-inflation component that adjusts the probability of observing a count of one.
The algorithm:
Calculates the rate parameter () as .
Computes the one-inflation probabilities () based on .
Simulates counts for each observation:
Draws a random number to determine whether the count is one.
Iteratively calculates probabilities for higher counts until the random number is matched.
This function is useful for generating synthetic data for testing or simulation studies involving one-inflated Poisson models.
A numeric vector of simulated count data.
oneinfl for fitting one-inflated models.
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) Z <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) g <- c(1.0, 0.3) simulated_data <- roipp(b, g, X, Z) print(simulated_data)# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) Z <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) g <- c(1.0, 0.3) simulated_data <- roipp(b, g, X, Z) print(simulated_data)
Simulates count data from a one-inflated, zero-truncated negative binomial (OIZTNB) process using specified parameters for the rate, one-inflation, and dispersion components.
roiztnb(b, g, alpha, X, Z)roiztnb(b, g, alpha, X, Z)
b |
A numeric vector of coefficients for the rate component. |
g |
A numeric vector of coefficients for the one-inflation component. |
alpha |
A numeric value representing the dispersion parameter for the negative binomial distribution. |
X |
A matrix or data frame of predictor variables for the rate component. |
Z |
A matrix or data frame of predictor variables for the one-inflation component. |
This function generates count data from a one-inflated, zero-truncated negative binomial process. The process combines:
A negative binomial distribution for counts greater than one.
A one-inflation component that adjusts the probability of observing a count of one.
The algorithm:
Calculates the rate parameter () as .
Computes the one-inflation probabilities () based on .
Computes the negative binomial dispersion parameter ().
Simulates counts for each observation:
Draws a random number to determine whether the count is one.
Iteratively calculates probabilities for higher counts until the random number is matched.
This function is useful for generating synthetic data for testing or simulation studies involving one-inflated, zero-truncated negative binomial models.
A numeric vector of simulated count data.
oneinfl for fitting one-inflated models.
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) Z <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) g <- c(1.0, 0.3) alpha <- 1.5 simulated_data <- roiztnb(b, g, alpha, X, Z) print(simulated_data)# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) Z <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) g <- c(1.0, 0.3) alpha <- 1.5 simulated_data <- roiztnb(b, g, alpha, X, Z) print(simulated_data)
Simulates count data from a zero-truncated Poisson process using specified parameters for the rate component.
rpp(b, X)rpp(b, X)
b |
A numeric vector of coefficients for the rate component. |
X |
A matrix or data frame of predictor variables for the rate component. |
This function generates count data from a zero-truncated Poisson process, which models count data without zeros. The process involves:
Calculating the rate parameter () as .
Iteratively computing probabilities for counts starting from 1 and adding to the cumulative probability until a randomly drawn value is matched.
This function is useful for generating synthetic data for testing or simulation studies involving zero-truncated Poisson models.
A numeric vector of simulated count data.
truncreg for fitting zero-truncated models.
# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) simulated_data <- rpp(b, X) print(simulated_data)# Example usage set.seed(123) X <- matrix(rnorm(100), ncol = 2) b <- c(0.5, -0.2) simulated_data <- rpp(b, X) print(simulated_data)
Performs a Wald test to evaluate the joint significance of a predictor variable in both the rate and one-inflation components of a model.
signifWald(model, varname)signifWald(model, varname)
model |
A fitted model object of class |
varname |
A character string specifying the name of the predictor variable to test. |
This function tests the null hypothesis that the coefficients for the specified predictor
variable are jointly equal to zero in both the rate (beta) and one-inflation
(gamma) components of the model. The test statistic is calculated as:
where is the vector of coefficients for the predictor in the rate and
one-inflation components, and is their variance-covariance matrix. The p-value is
computed using a chi-squared distribution with 2 degrees of freedom.
A list with the following components:
WThe Wald test statistic.
pvalThe p-value associated with the test statistic, based on a chi-squared distribution with 2 degrees of freedom.
oneinfl for fitting one-inflated models.
oneWald for a general Wald test of one-inflation parameters.
# Example usage set.seed(123) df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") result <- signifWald(model, varname = "x") print(result$W) # Wald test statistic print(result$pval) # p-value# Example usage set.seed(123) df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") result <- signifWald(model, varname = "x") print(result$W) # Wald test statistic print(result$pval) # p-value
Provides a summary of the fitted model, including estimated coefficients, standard errors, significance levels, and other relevant statistics.
## S3 method for class 'oneinflmodel' summary(object, ...)## S3 method for class 'oneinflmodel' summary(object, ...)
object |
A model object of class '"oneinflmodel"' (for one-inflated models). |
... |
Additional arguments (currently unused). |
This function generates a detailed summary of the fitted model, including:
- Estimated coefficients for the rate component (beta).
- Estimated coefficients for the one-inflation component (gamma).
- Standard errors.
- z-statistics, associated p-values, and corresponding significance codes.
- Average, and average absolute, one-inflation.
- Log-likelihood of the fitted model.
Prints a summary table of coefficients, standard errors, z-values, p-values, significance codes, one-inflation probabilities, and log-likelihood.
oneinfl for fitting one-inflated models.
# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") summary(model)# Example usage df <- data.frame(x = rnorm(100), z = rnorm(100), y = rpois(100, lambda = 5)) model <- oneinfl(y ~ x | z, df = df, dist = "Poisson") summary(model)
Provides a summary of the fitted model, including estimated coefficients, standard errors, significance levels, and other relevant statistics.
## S3 method for class 'truncmodel' summary(object, ...)## S3 method for class 'truncmodel' summary(object, ...)
object |
A model object of class '"truncmodel"' (for truncated models). |
... |
Additional arguments (currently unused). |
This function generates a detailed summary of the fitted model, including:
- Estimated coefficients for the rate component (beta).
- Standard errors.
- z-statistics, associated p-values, and corresponding significance codes.
- Log-likelihood of the fitted model.
Prints a summary table of coefficients, standard errors, z-values, p-values, significance codes, and log-likelihood.
truncreg for fitting truncated regression models.
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 5)) model <- truncreg(y ~ x, df = df, dist = "Poisson") summary(model)# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 5)) model <- truncreg(y ~ x, df = df, dist = "Poisson") summary(model)
Fits a positive Poisson (PP) or zero-truncated negative binomial (ZTNB) regression model.
truncreg(formula, df, dist = "negbin", start = NULL, method = "BFGS")truncreg(formula, df, dist = "negbin", start = NULL, method = "BFGS")
formula |
A symbolic description of the model to be fitted. |
df |
A data frame containing the variables in the model. |
dist |
A character string specifying the distribution to use. Options are '"Poisson"' or '"negbin"'. |
start |
Optional. A numeric vector of starting values for the optimization process. Defaults to 'NULL', in which case starting values are attempted to be chosen automatically. |
method |
A character string specifying the optimization method to be passed to |
This function fits a regression model for zero-truncated counts. Zero-truncated models are used when the count data does not include zeros, such as in cases where only positive counts are observed.
The function supports two distributions: - '"Poisson"': Zero-truncated Poisson regression. - '"negbin"': Zero-truncated negative binomial regression.
The function uses numerical optimization via optim to estimate the parameters.
An object of class '"truncmodel"' containing the following components:
betaEstimated coefficients for the regression model.
alphaDispersion parameter (only for negative binomial distribution).
vcVariance-covariance matrix of the estimated parameters.
loglLog-likelihood of the fitted model.
distThe distribution used for the model ("Poisson" or "negbin").
formulaThe formula used for the model.
summary for summarizing the fitted model.
# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 1) + 1) model <- truncreg(y ~ x, df = df, dist = "Poisson") summary(model)# Example usage df <- data.frame(x = rnorm(100), y = rpois(100, lambda = 1) + 1) model <- truncreg(y ~ x, df = df, dist = "Poisson") summary(model)