Roth versus traditional accounts in a life-cycle model with tax risk*

MARIE-EVE LACHANCE

doi:10.1017/S1474747212000054

Roth versus traditional accounts in a life-cycle model with tax risk*

Published online by Cambridge University Press: 15 March 2012

MARIE-EVE LACHANCE

Show author details

MARIE-EVE LACHANCE*: Affiliation:
Department of Finance, College of Business Administration, San Diego State University, 5500 Campanile Drive, San Diego, CA 92182-8236, USA (e-mail: Marie.Lachance@sdsu.edu)

Article contents

Abstract
Introduction
Model
Solution
Assumptions for numerical illustrations
Endogenous tax rates and optimal consumption patterns
Who gains by choosing Roth accounts?
Retirement savings
Conclusion
Footnotes
References

Rights & Permissions

Abstract

This paper analytically solves a life-cycle model that compares traditional and Roth retirement accounts. It includes realistic features such as tax deductibility of contributions and taxation of withdrawals, tax bracket structure with deductions, taxation of Social Security benefits, and tax risk at retirement. With current taxes, choosing a traditional account over a Roth creates small welfare losses in only a few cases, largely for those with higher incomes and pensions who are subject to the taxation of Social Security benefits. We also investigate tax variability and find that diversified strategies offer only small risk reduction benefits in our illustrations.

Keywords

Life-cycle models Roth accounts tax risk

Type: Articles
Information: Journal of Pension Economics & Finance , Volume 12 , Issue 1 , January 2013 , pp. 28 - 61

DOI: https://doi.org/10.1017/S1474747212000054 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

1 Introduction

A number of recent developments have sparked renewed interest in Roth retirement accounts. First, increasing government deficits can lead to higher tax rates in the future, reducing the appeal of traditional tax-deferred Individual Retirement Arrangements (IRAs) and 401(k)s. Second, new provisions in the tax code encourage the conversion of traditional accounts into Roth accounts.Footnote ¹ Starting in 2010, the $100,000 modified adjusted gross income (MAGI) limit for conversions has been removed and a special clause allows those who make the conversion in 2010 to split the proceeds equally between their 2011 and 2012 tax returns. Third, designated Roth accounts were introduced in the 401(k) arena in 2006. Although employer matching contributions are still restricted to traditional accounts, employees’ contributions can be allocated to a Roth version of a 401(k) or 403(b).

This paper formally examines the differences between the front-loaded (traditional) and back-loaded (Roth) approaches to retirement accounts within the context of a life-cycle model. Building on Yaari's (Reference Yaari1965) classic framework with uncertain lifetime and borrowing constraints, the following realistic tax features are added: tax brackets and deductions, taxation of Social Security benefits, impact of retirement contributions/withdrawals on taxable income, and tax risk after retirement. This paper contributes to the literature by providing an analytical solution to the model for three cases: traditional account only, Roth account only, and any fixed combination of traditional and Roth accounts.

The dual approach proposed in this paper is a departure from more conventional dynamic programming techniques used to solve complex problems. This approach has three advantages: it handles better the discontinuity issues introduced by the realistic tax structure, it produces exact values, and it is fast. Working with accurate results is particularly important when examining small differences between two tax systems. The disadvantage of dynamic programming solutions is that they are based on unknown value functions, which must be estimated with time-consuming numerical optimization and interpolation. The broader methodological contribution of this paper is to show that the solution can be extracted instead from a set of known budget constraint equations, which eliminates the need to estimate unknown value functions and yields exact values. Further, the structure of the solution allows us to express the budget constraint with a series of closed-form equations. In terms of limitations, the model's main drawback is that it does not currently incorporate sources of background risk such as income risk, risky assets returns, or medical expenses risk.

The solution derived in this paper allows us to investigate three questions: Who benefits from Roth accounts? How does tax risk impact on the comparison? How does a policy of tax-deductible contributions impact on retirement savings? To answer these questions, we follow an approach commonly used in this literature and illustrate the model's solution with realistic income profiles for three education groups: less than high school, high school, and college. We also add a new dimension to the results by presenting them for three age cohorts (ages 25, 45, and 65 in 2010) and different levels of pension income.

When comparing traditional and Roth accounts, the standard argument is that traditional front-loaded accounts are a better option when marginal tax rates decline after retirement (see e.g. Engen et al. Reference Engen, Gale and Scholz1994).Footnote ² The conventional wisdom is that this is generally the case and accordingly, traditional accounts have been favored historically. For example, Butterfield et al. (Reference Butterfield, Jacobs and Larkins2000) conclude that ‘traditional IRAs have significant wealth accumulation advantages over Roth IRAs in all but rare circumstances’. Our solution shows that this is not necessarily the case when the following tax loophole is taken into account: withdrawals from traditional accounts (but not from Roth accounts) are considered when determining the amount of taxable Social Security benefits. For those affected, this implies that marginal tax rates after retirement effectively increase by either 50% or 85%.Footnote ³ Depending on the specific combination of tax rates, this either mitigates the tax decline after retirement or leads to an increase.

By contrasting expected utility in the pure Roth and traditional cases for the scenarios considered, we find that traditional accounts provide unambiguous benefits over their Roth counterparts mostly for those who pay no or little taxes after retirement.Footnote ⁴ For those at the other end of the spectrum with higher incomes and pensions, traditional accounts generate only small gains or losses due to the taxation of Social Security benefits. Currently, this issue affects mostly retirees in the group with a college degree, but the situation will be different for the younger generation as Social Security's taxation thresholds are not indexed for inflation.

Roth accounts may also become the preferred option if future tax rates increase. For instance, Kotlikoff et al. (Reference Kotlikoff, Marx and Rapson2008) showed that they would be more appealing in a 30% tax hike scenario. We perform a different exercise by solving for the breakeven percentage increase in future tax rates required to make individuals indifferent between Roth and traditional accounts. For the group who pays no or minimal taxes after retirement, traditional accounts are still optimal even with a significant increase in taxes. For those who pay meaningful taxes, we find that the breakeven increases range from 4% to 41%. Although a modest increase in future taxes could tip the scale in favor of Roth accounts for those with high pensions, a more radical change would be required for those without pensions.

To address concerns about tax risk, diversified strategies that mix traditional and Roth accounts have been commonly suggested. Yet, little formal evidence has been provided to support them. Dickson (Reference Dickson, Gale, Shoven and Warshawsky2004) is one of the rare models to include tax variability in a two-period model with both traditional and Roth accounts.Footnote ⁵ In this paper, we extend the concept of tax variability to a life-cycle framework, which allows us to measure the benefits of tax diversification in a realistic setup. We consider a simple naïve diversification strategy where, in every period, the individual allocates the same fraction of his savings/withdrawals to the traditional and Roth accounts. We solve for the optimal fraction to invest in traditional accounts in the cases with and without tax risk and find that the optimal allocation is essentially the same in both cases. In the scenario without tax risk, the option to invest a fraction of savings in Roth accounts can be worth a few thousand dollars because it can help avoid higher marginal tax rates. In contrast, risk reduction benefits have more limited impact on the allocation decision because they amount to a few hundred dollars at best.Footnote ⁶

Finally, our model allows us to examine the differential impact of traditional accounts on retirement savings and consumption.Footnote ⁷ Our results indicate that, although traditional accounts can increase gross retirement savings substantially for those with a college degree (by up to $68,000 in the baseline scenario), most of this increase vanishes once we take into account the present value of taxes that will have to be paid on withdrawals. In other words, for that group traditional accounts increase the size of assets under management, but not necessarily retirement consumption. Those in the less-than-high-school group present a different story as they do not pay taxes after retirement and actually display the highest increase in retirement consumption at $15,000. Decomposing the increase into an income effect (from the tax subsidy) and a substitution effect (from increased savings before retirement), we find that the substitution effect has more impact on retirement consumption because a large fraction of the tax subsidy is used to increase consumption before retirement.

The remainder of this paper is structured as follows. Section 2 describes the model and Section 3 provides the solution. Section 4 lists the assumptions used in the numerical illustrations and Section 5 illustrates some representative cases. Section 6 analyzes who gains from Roth accounts and considers the potential benefits of mixed strategies. Section 7 details the impact of tax deductible contributions on the level of retirement savings. Section 8 concludes with some suggestions for applications and future research.

2 Model

The model builds on Yaari's (Reference Yaari1965) classic life-cycle framework, which features borrowing constraints and an uncertain lifetime. The model's contribution to the life-cycle literature is to incorporate many realistic features of the tax treatment of retirement savings, while maintaining an analytical structure. First, the model can be used to model either Roth or traditional accounts. In the Roth case, contributions and withdrawals do not affect taxes. In the traditional case, contributions are deductible from taxable income and withdrawals are taxed. The accounts can be part of an IRA or a 401(k). Second, while many models use a single fixed tax rate, this model reflects the United States tax structure with various tax brackets and deductions. Third, the model lets tax rates be endogenously determined rather than being specified exogenously. Fourth, the model incorporates the actual rules for the taxation of Social Security benefits. Fifth, the model allows for tax risk after retirement.

Although it would also be interesting to let the individual choose between Roth and traditional accounts in each period, this setup would make the problem more cumbersome to solve. The model can be used, however, to offer a solution for the simpler case of a naïve diversification approach. With that strategy, the proportions allocated to the traditional and Roth accounts are, respectively, α and 1 − α in every period. This solution allows us to illustrate the potential value of diversification in a context with tax risk. In the remainder of the text, all equations are given as a function of α since the pure cases can be viewed as special cases as follows:

(1)

$$\matrix{ {\alpha \equals 1} \hfill \tab {{\rm Traditional}\;{\rm account}\;{\rm only}{\rm.}} \hfill \cr {\alpha \in \lpar 0\comma 1\rpar } \hfill \tab {{\rm Fraction} \; \alpha \; {\rm in}\; {\rm traditional}\; {\rm account}\; {\rm and}\; {\rm 1} \minus \alpha \; {\rm in}\; {\rm Roth}\; {\rm account}{\rm.}} \hfill \cr {\alpha \equals 0} \hfill \tab {{\rm Roth}\; {\rm account}\; {\rm only}{\rm.}} \hfill \cr} $$

To balance the increased complexity brought by realistic taxes, straightforward assumptions are used for the rest of the model. This approach has the advantage of leading to a solution with exact values which lends itself well to analysis. For the elements not included in this model, the reader is referred to: Cocco et al. (Reference Cocco, Gomes and Maenhout2005) (income risk and portfolio choice), Love (Reference Love2007) (employer contributions), and Kotlikoff et al. (Reference Kotlikoff, Marx and Rapson2008) (married individuals).

2.1 Economic and demographic assumptions

The individual is age t ₀ when the problem starts and he can live up to age ω. The probability that he survives from age t ₀ to age t is denoted by $p_{t_{\setnum{0}} \comma t} $. It is assumed that $p_{t_{\setnum{0}} \comma t} $ is continuous, decreases with time, and eventually converges to zero as t → ω. The utility of consumption is represented by an increasing and concave function u(c) with u′(c) > 0, u″(c) < 0, $\mathop {{\rm lim}}\limits_{c \to \setnum{0}} u\prime \lpar c\rpar \equals \infty $, and $\mathop {{\rm lim}}\limits_{c \to \infty } u\prime\lpar c\rpar \equals 0$. Time preferences are taken into account by discounting utility at a continuous rate β. The combined discount from time and mortality is denoted by

(2)

$$f\hskip 1\lpar t\rpar \equals {\rm e}^{ \minus \beta \lpar t \minus t_{\setnum{0}} \rpar } p_{t_{\setnum{0}} \comma t}.$$

All economic assumptions are expressed in real terms. Savings grow at a real risk-free rate r > 0. The wealth process is denoted by W_t and it is assumed that the individual starts the problem with no initial savings, i.e. $W_{t_{\setnum{0}} } \equals 0$. Borrowing is not allowed. Before retirement, pre-tax income is a continuous function denoted by y_t. The individual retires at age R, which is assumed to be 65 years old. After retirement, the individual receives a Social Security pension with annual payments of SS > 0. He may also receive annual income y_R ⩾ 0 from an employer pension or another source of annuity payments, for a total of y_t=y_R + SS per year. The solution considers only the case where expected after-tax income declines at the time of retirement.

2.2 Taxation

Before retirement, income is subject to a payroll tax rate π. At all times, federal income taxes based on the United States tax bracket system apply.Footnote ⁸ For k = 0, …, K, the brackets are denoted by [B_k,B_k ₊₁) and the marginal tax rate within bracket k is τ_k. Tax rates after retirement are subject to a one-time multiplicative shock θ that applies to all marginal tax rates. To represent this risk, the model allows choosing across the entire family of discrete probability distribution. Let N be the number of possible states, there is a probability p_i with i = 1, …, N that θ = θ_i and $\tau _{k}^{i} \equals \theta _{i} \tau _{k} $. The case without tax risk can be viewed as a special case of this model with N = 1 and θ₁ = 1. For the remainder of this paper, all functions of τ_k should be interpreted as random variables after retirement although the notation is not differentiated.

To compute taxable income, the standard deduction and personal exemption must be taken into account. The sum of these two components is denoted by E before age 65 and by E_R after age 65 (there is an increase in the standard deduction at age 65). In the traditional case, it must also be recognized that taxable income is reduced by contributions before retirement and increased by withdrawals after retirement. The contributions or savings before retirement are denoted by s_t where s_t=y_t−c_t − tax_t(s_t) and c_t is consumption. A negative s_t represents a withdrawal. Let SS_t^tx denote the taxable portion of Social Security benefits, the taxable income function is given by

(3)

$$y_{t}^{{\rm tx}} \lpar s_{t} \rpar \equals \left\{ {\matrix{ {y_{t} \minus \alpha s_{t} \minus E\comma } \hfill \tab {t \lt R\comma } \hfill \cr {y_{R} \plus SS_{t}^{{\rm tx}} \lpar s_{t} \rpar \minus \alpha s_{t} \minus E_{R} \comma } \hfill \tab {t\geqslant R.} \hfill \cr} } \right.$$

Accordingly, for k = 0, …, K, the tax function is

(4)

$${\rm tax}_{t} \lpar s_{t} \rpar \equals \pi y_{t} \cdot 1\lpar t \lt R\rpar \plus \tau _{k} y_{t}^{{\rm tx}} \lpar s_{t} \rpar \plus G_{k} \comma \quad \quad B_{k} \leqslant y_{t}^{{\rm tx}} \lpar s_{t} \rpar \lt B_{k \plus \setnum{1}} \comma $$

where

(5)

$$G_{k} \equals \mathop{\sum}\limits_{j \equals \setnum{1}}^{k \minus \setnum{1}} {\tau _{j} } \lpar B_{j \plus \setnum{1}} \minus B_{j} \rpar \minus \tau _{k} B_{k}.$$

It should be noted that some of the features of the tax system are not included in the model because they do not affect the solution in most cases illustrated here. For instance, the model does not limit the amount of tax deductible contributions because the optimal savings in all our illustrations are below the 401(k)’s $16,500 limit. In addition, the 10% penalty tax for early withdrawals before age 59½ is not incorporated because the model does not feature shocks that would trigger them. This eventuality is discussed later in Section 6.3.

The model is completed by specifying SS_t^tx(s_t), the taxable portion of Social Security benefits. This is a fairly complex formula. First, a provisional income measure PI_t ≡ PI(s_t) is defined by adding half of Social Security benefits to other sources of income (including withdrawals from traditional accounts, but not Roth accounts), yielding:

(6)

$${\rm Provisional}\;{\rm Income}_{t} \equals {\rm PI}\lpar s_{t} \rpar \equals y_{R} \plus {{{\rm SS}} \over 2} \minus \alpha s_{t} \comma \quad \quad t\geqslant R.$$

Once PI_t is computed, it has to be compared to two ‘base amounts’ X ¹ and X ². Currently, these base amounts are X ¹ = $25,000 and X ² = $34,000 for singles; the corresponding measures are $32,000 and $44,000 for those married filing jointly. These amounts are not indexed for inflation. Given that the rest of the model is expressed in real terms, these thresholds have to be adjusted for inflation (denoted by i) and this creates cohort effects in the model. For example, for somebody retiring at age 65 in 2010, the first base amount in real terms is $25,000. For somebody currently age 45 in 2010, this same base amount will be $13,720 when they retire in 20 years (assuming i = 3%), making it more likely that they will be taxed. For a given individual, the base amounts will also decrease by e^−it in real terms after t years of retirement. Thus, let a be the current age in 2010, the base amounts in real terms can be expressed by

(7)

$$X_{t}^{\setnum{1}} \equals X^{\setnum{1}} {\rm e}^{ \minus {\rm i}\lpar t \minus a\rpar } \quad \quad {\rm and}{\kern 1pt} \quad \quad X_{t}^{\setnum{2}} \equals X^{\setnum{2}} {\rm e}^{ \minus {\rm i}\lpar t \minus a\rpar }.$$

If the provisional income PI_t is less than the first base amount $X_{t}^{\setnum{1}} $, none of the Social Security benefits are taxable. If PI_t is greater than the first base amount, but lower than the second, 50% of the excess PI_t−X _t¹ is taxable income.Footnote ⁹ If PI_t is greater than the second base amount, then the taxable portion is 50%(X _t²−X _t¹) + 85%(PI_t−X _t²), subject to a maximum of 85% of Social Security benefits. Thus, depending on the level of PI_t, there are four different formulas that can apply: 0, 50%(PI_t−X_t ¹), 50%(X _t²−X _t¹) + 85%(PI_t−X _t²), and 85% SS. To summarize, these four cases can be embedded into one linear structure:

(8)

$${\rm SS}_{t}^{{\rm tx}} \lpar s_{t} \rpar \equals M_{h} {\rm PI}\lpar s_{t} \rpar \plus H_{t\comma h} \comma \quad \quad B_{t\comma h}^{S} \leqslant {\rm PI}\lpar s_{t} \rpar \lt B_{t\comma h \plus \setnum{1}}^{S} \comma $$

where

(9)

The key notation to remember for the rest of the analysis is M_h, which represents the marginal rate of inclusion of Social Security benefits in taxable income. From equations (6) and (8), a dollar withdrawn from a traditional account increases the taxable portion of Social Security benefits by M_h and triggers an additional tax of τ_kM_h. Thus, M_h can be viewed as a factor that magnifies the marginal tax rate from τ_k to τ_k(1+M_h). For the remainder of this paper, τ_k(1+M_h) will be referred to as the effective marginal tax rate. For example, if τ_k = 15% and M_h = 85, withdrawals from traditional accounts are effectively taxed at a rate of 15% (1 + 85%) = 27.75%.

To put this in perspective, recall that the basis for favoring traditional accounts is that marginal tax rates after retirement should be lower than before retirement. Once the taxation of Social Security benefits is taken into account, this logic may no longer hold as it is possible to have lower income after retirement, but higher tax rates. This issue does not necessarily affect everyone, those with either very low or very high incomes are not impacted because their inclusion rate M_h is zero. It should be kept in mind that the composition of this group varies across cohorts because the base amounts change in real terms. Initially, those with higher incomes are subject to the 50% or 85% inclusion rates. Eventually, they will see their M_h decline to zero when their taxable benefits reach 85% SS. On the other hand, those with lower incomes start with M_h = 0 but ultimately see their inclusion rates increase to 50% and 85% as the base amounts decrease in real terms.

2.3 Optimization problem

Combining these assumptions, the individual's problem is to choose consumption to maximize his expected utility, subject to a budget constraint and borrowing constraints. Thus, the optimization problem is

(10)

$$\mathop {{\rm max}}\limits_{c \gt \setnum{0}} E\left[ {\int_{t_{\setnum{0}} }^{\omega } {f\hskip 1\lpar t\rpar u\lpar c_{t} \rpar {\rm d}t} } \right]$$

such that

(11)

$${\rm d}W_{t} \equals \left[ {W_{t} r \plus y_{t} \minus c_{t} \minus {\rm tax}_{t} \lpar s_{t} \lpar c_{t} \rpar \rpar } \right]{\rm d}t$$

and

(12)

$$W_{t} \geqslant 0\quad \forall t\comma \quad \quad W_{t_{\setnum{0}} } \equals 0\comma $$

where tax_t(s_t) is given in (4) and s_t(c_t) by

(13)

$$s_{t} \lpar c_{t} \rpar \equals {{y_{t} \minus {\rm tax}_{t} \lpar 0\rpar \minus c_{t} } \over {1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar }}$$

for $B_{k} \leqslant y_{t}^{{\rm tx}} \lpar s_{t} \rpar \lt B_{k \plus \setnum{1}} $ and $B_{t\comma h}^{S} \leqslant {\rm PI}\lpar s_{t} \rpar \lt B_{t\comma h \plus \setnum{1}}^{S} $ with $y_{t}^{{\rm tx}} \lpar s_{t} \rpar $ given in (3) and PI(s_t) given in (6).

2.4 Discontinuity points for τ_k and M_h

The rates τ_k and M_h in this model are step functions of consumption and these discontinuities affect the structure of the solution in the next section. To facilitate the exposition of the solution, the notations C_k and C_h are introduced to identify the levels of consumption where respectively the marginal tax rate jumps from τ_k₋ ₁ to τ_k and the rate of inclusion of Social Security benefits jumps from M_h₋ ₁ to M_h. We can solve for C_k by setting taxable income equal to the next tax bracket (i.e. $y_{t}^{{\rm tx}} \lpar s\lpar C_{k} \rpar \rpar \equals B_{k} $) and inverting C_k from equation (13) as follows:Footnote ¹⁰

(14)

$$C_{k} \equals \left\{ {\matrix{ {y_{t} \lpar 1 \minus \pi \minus 1\sol \alpha \rpar \plus B_{k} \lpar 1\sol \alpha \minus \tau _{k} \rpar \minus G_{k} \plus E\sol \alpha \comma } \hfill \tab {t \lt R\comma } \hfill \cr {y_{R} \left( {1 \minus {1 \over \alpha }} \right) \plus {\rm SS}\left( {1 \minus {{M_{h} \sol 2} \over {\alpha \lpar 1 \plus M_{h} \rpar }}} \right) \plus B_{k} \left( {{1 \over {\alpha \lpar 1 \plus M_{h} \rpar }} \minus \tau _{k} } \right) \minus G_{k} \plus {{ \minus H_{t\comma h} \plus E_{R} } \over {\alpha \lpar 1 \plus M_{h} \rpar }}\comma } \hfill \tab {t\geqslant R.} \hfill \cr} } \right.$$

Similarly, C_h is derived by setting the provisional income in equation (6) equal to the next threshold $B_{t\comma h}^{S} $ to obtain:

(15)

$$\eqalign{ C_{h} \equals \tab y_{R} \left( {1 \minus {1 \over \alpha }} \right) \plus {\rm SS}\left( {1 \plus {{\tau _{k} \alpha \minus 1} \over {2\alpha }}} \right) \plus\cr \tab B_{t\comma h}^{S} \times \left( {{1 \over \alpha } \minus \tau _{k} \lpar 1 \plus M_{h} \rpar } \right) \minus G_{k} \minus \tau _{k} \lpar H_{t\comma h} \minus E^{R} \rpar. \cr} $$

3 Solution

We derive a complete solution and proof for the optimization problem in (10)–(13). This section outlines the solution's key components and we refer to Appendix A for additional technical details. Readers less interested in the derivation of the results can proceed to the illustrations in Section 4. In a problem with borrowing constraints, the binding periods with $W_{t} \ast \equals 0$ must be identified because the solution is different in these periods as the individual simply consumes all his after-tax income. To simplify the exposition in this section, we consider the case where the individual starts saving for retirement at age T ₁ and exhausts his savings by age T ₂, i.e. we have the following structure:

3.1 Lagrangian and dual approach

Solving the problem with conventional methods can be challenging because we need to prove that the borrowing constraint is satisfied everywhere. Without restricting the model's assumptions, this is a difficult task because the optimal consumption function provides little insight into the sign of the wealth process. In this context, it is useful to turn to the dual approach suggested in Lachance (Reference Lachance2012) and extend it to include discontinuities and risks. Dual approaches are used with problems that are difficult to solve in their primal form, but are easier to handle in their equivalent dual form. In this case, the advantage is that we do not need to prove directly that $W_{t}\ast \geqslant 0$ everywhere and this allows us to get a simpler condition.

As with conventional optimization methods, to prove that a solution is optimal the problem's Lagrangian must be derived first. Appendix A.1 describes how the standard Lagrangian is constructed and rewrites it in a form suitable for the dual approach as follows:

(16)

$$L \equals E\left[ {\int_{t_{\setnum{0}} }^{\omega } {\lpar f\hskip 1\lpar t\rpar u\lpar c_{t} \rpar \plus X\lpar t\rpar {\rm e}^{ \minus r\lpar t \minus t_{\setnum{0}} \rpar } s_{t} \lpar c_{t} \rpar \rpar {\rm d}t} } \right].$$

The process X(t) is just a transformation of the original Lagrange multipliers; it must be decreasing during the binding periods and equal to a constant λ during the non-binding period [T ₁, T ₂]. With tax risk after retirement, a different constant λ_i applies in each state i. By construction, the constants are related by $\lambda \equals \sum _{i \equals \setnum{1}}^{N} p_{i} \lambda _{i} $ and can be interpreted as the marginal utility of wealth. Less technically, these conditions on X(t) mean that expected utility cannot be improved by saving an additional dollar at time t and spending it later (and vice-versa).

The first step of the dual approach is to find $c_{t}\ast $ that maximizes L, which is a simple unconstrained maximization problem detailed in Section 3.2. The second step of the dual approach is to find X(t) that minimizes L, which in our setup is equivalent to:

• Solving for the constants λ and λ_i that satisfy the budget constraint equations W(λ, λ_i) = 0, i = 1, …, N and the condition $\lambda \equals \sum _{i \equals \setnum{1}}^{N} p_{i} \lambda _{i} $.
• If there is more than one constant λ or λ_i that satisfies the budget constraint, the highest one is optimal.

A solution that satisfies these criteria maximizes utility and satisfies all the problem's constraints. Appendix A.2 details the equation for the budget constraint W(λ, λ_i) = 0 and Appendix A.3 explains how the bisection method can be employed to solve for the constants λ and λ_i.

3.2 Optimal consumption and issues with discontinuities

To find $c_{t}\ast $ that maximizes L unconditionally when X(t) = λ, the standard procedure is to derive the first-order condition (F.O.C.) as follows:

(17)

$$\eqalign{ \tab L\prime\lpar c_{t} \rpar \equals f\hskip 1\lpar t\rpar u\prime\lpar c_{t} \rpar \minus {{\lambda {\rm e}^{ \minus r\lpar t \minus t_{\setnum{0}} \rpar } } \over {1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar }}\comma \cr \tab {\rm F}{\rm.O}{\rm.C}{\rm.}\colon \quad L\prime\lpar c_{t}\ast \rpar \equals 0\quad \Rightarrow \quad c_{t}\ast \lpar \lambda \rpar \equals u\prime^{ \minus \setnum{1}} \left( {{{\lambda {\rm e}^{ \minus r\lpar t \minus t_{\setnum{0}} \rpar } } \over {f\hskip 1\lpar t\rpar \lpar 1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar \rpar }}} \right). \cr} $$

The first component in L′(c_t) measures the marginal benefit of consumption and the second one the opportunity cost of saving where s _t′(c_t) = −1/(1 − ατ_k(1+M_h)). With traditional accounts, savings do not decrease one-for-one with consumption because taxes create an additional reward (or penalty for withdrawals). In that case, higher taxes translate into higher savings before retirement and lower withdrawals after retirement.

When L′(c_t) is continuous, L″(c_t) < 0 and there is a unique solution to the F.O.C. $L\prime\lpar c_{t}\ast \rpar \equals 0$. When τ_k and M_h enter L′(c_t), this cannot be taken as granted because the discontinuities raise new technical issues: (1) the F.O.C. may not have a solution $c_{t}\ast $ for every λ, (2) the F.O.C. can have more than one solution $c_{t}\ast $ for a given λ, and (3) circularity, i.e. τ_k is a function of $c_{t}\ast $ and $c_{t}\ast $ is a function of τ_k. Note that these discontinuity issues can also arise in retirement problems where retirement savings are rewarded (or withdrawals penalized) and the reward/penalty can suddenly change. For example, this would be the case if an employer offers a 50% match on employee contributions, but stops the match for contributions above 6%.

For the first issue, recall that Section 2.4 defined the discontinuity points C_k and C_h where τ_k and M_h jump. Here, the notation C will stand for any of these points. The problem is that it is not possible to solve for c in L′(c) = 0 when at a discontinuity point C we have L′(C⁻) > 0 and L′(C) < 0. Actually, with L″(c) < 0, C is locally optimal because L(c) increases up to C and decreases thereafter. Since the equation for L′(C) in (17) is a function of λ, it will be convenient to rewrite the condition in terms of a range of values for λ as follows:

(18)

$$L\prime\lpar C \minus \rpar \gt 0\quad {\rm and}\quad L\prime\lpar C\rpar \lt 0\;{\rm is}\;{\rm equivalent}\;{\rm to}\;\rmLambda _{t} \lpar C\rpar \lt \lambda \lt \rmLambda _{t} \lpar C^{ \minus } \rpar \comma $$

where the function

(19)

$$\rmLambda _{t} \lpar c\rpar \equals u'\lpar c\rpar f\hskip 1\lpar t\rpar {\rm e}^{r\lpar t \minus t_{\setnum{0}} \rpar } \lpar 1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar \rpar $$

is simply introduced to express the solution more compactly later on.Footnote ¹¹ Outside these values for λ, the standard F.O.C. solution in (17) applies. This approach with brackets for λ offers a simple test to determine when an algorithm should use the standard F.O.C. solution in (17) and when it should use the discontinuity points C. It has also the advantage of addressing the circularity issue: for a given λ, we know which τ_k or M_h to use in $c_{t}\ast \lpar \lambda \rpar $.

The second problem that we can encounter is having more than one locally optimal solution. This happens when L′(c_t) increases at a discontinuity, i.e. in the uncommon scenario where the effective marginal tax rate decreases with consumption. In this model, this occurs at the point C_h ₌₃ where taxable Social Security benefits reach the 85% maximum and M_h drops from 85% to 0%. Specifically, when $L'\lpar C_{h \equals \setnum{3}}^{ \minus } \rpar \lt 0$ and L′(C_h ₌₃) > 0, there are two locally optimal solutions c ₁<C_h ₌₃ and c ₂>C_h ₌₃ (with h<3 for c ₁ and h = 3 for $c_{\setnum{2}} $). The globally optimal one maximizes L(c_t) and to express this more systematically, we solve for a point $\lambda \equals \bar{\rmLambda }$ such that the individual is indifferent between c ₁ and c ₂. It is possible to show that c ₁ is optimal for all $\lambda \gt \bar{\rmLambda }$, c ₂ is optimal for all $\lambda \lt \bar{\rmLambda }$, and the solution jumps between c ₁ and c ₂ when $\lambda \equals \bar{\rmLambda }$.Footnote ¹² Figure 1 makes this result more intuitive by illustrating the solution graphically for the cases $\lambda \lt \bar{\rmLambda }$, $\lambda \equals \bar{\rmLambda }$, and $\lambda \gt \bar{\rmLambda }$.

Figure 1. Illustration of jump in optimal consumption (optimal c_t maximizes L(c_t)).

Combining these results, the optimal solution in the non-binding period after retirement t∊[R, T ₂) can be condensed with:

(20)

$$c_{t}\ast \lpar \lambda \rpar \equals \left\{ {\matrix{ {u'^{ \minus \setnum{1}} \left( {{{\lambda e^{ \minus r\lpar t \minus t_{\setnum{0}} \rpar } } \over {f\hskip 1\lpar t\rpar \lpar 1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar \rpar }}} \right)\comma } \hfill \tab {\rm max} \left[ {\rmLambda _{t} \lpar C_{k \plus \setnum{1}}^{ \minus } \rpar \comma \rmLambda _{t} \lpar C_{h \plus \setnum{1}}^{ \minus } \rpar } \right]\leqslant \hfill \cr\tab\lambda \leqslant {\rm min} \left[ {\rmLambda _{t} \lpar C_{k} \rpar \comma \rmLambda _{t} \lpar C_{h} \rpar } \right]\comma \hfill \cr {C_{h} \comma } \hfill \tab {\rmLambda _{t} \lpar C_{h} \rpar \lt \lambda \lt \rmLambda _{t} \lpar C_{h}^{ \minus } \rpar \comma \quad h \lt 3}\comma \hfill \cr {C_{k}\comma } \hfill \tab {\rmLambda _{t} \lpar C_{k} \rpar \lt \lambda \lt \rmLambda _{t} \lpar C_{k}^{ \minus } \rpar \comma } \hfill \cr} } \right. $$

for $\lambda \geqslant \bar{\rmLambda }$ and

(21)

$$c_{t}\ast \lpar \lambda \rpar \equals \left\{ {\matrix{ {u\prime^{ \minus \setnum{1}} \left( {{{\lambda {\rm e}^{ \minus r\lpar t \minus t_{\setnum{0}} \rpar } } \over {f\hskip 1\lpar t\rpar \lpar 1 \minus \alpha \tau _{k} \rpar }}} \right)\comma } \hfill \tab {\rmLambda _{t} \lpar C_{k \plus \setnum{1}}^{ \minus } \rpar \leqslant \lambda \leqslant \rmLambda _{t} \lpar C_{k} \rpar \comma } \hfill \cr {C_{k\comma } } \hfill \tab {\rmLambda _{t} \lpar C_{k} \rpar \lt \lambda \lt \rmLambda _{t} \lpar C_{k}^{ \minus } \rpar \comma } \hfill \cr} } \right.$$

for $\lambda \lt \bar{\rmLambda }$. Before retirement, the points C_h related to Social Security do not apply and the solution in the non-binding period [T ₁,R) is given by (21) for all λ. In the pure Roth case (α = 0), only the top part of the solution applies. In other words, working with Roth accounts in optimization problems is much easier than working with traditional accounts.

If the utility function takes the standard power utility form $u\lpar c\rpar \equals c^{\setnum{1} \minus \gamma } \sol \lpar 1 \minus \gamma \rpar $ with γ≠1, the top portion of the solution in (20) and (21) becomes:

(22)

$$c_{t}\ast \lpar \lambda \rpar \equals \left( {{{\lambda {\rm e}^{\lpar \beta \minus r\rpar \lpar t \minus t_{\setnum{0}} \rpar } } \over {p_{t_{\setnum{0}} \comma t} \lpar 1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar \rpar }}} \right)^{ \minus \setnum{1}\sol \gamma }.$$

Appendix B shows how this equation can be combined with a flexible set of assumption for mortality and income to express the budget constraint W(λ, λ_i) = 0 as a series of closed-form equations.

3.3 Comparison with dynamic programming

In the previous literature, dynamic programming techniques have been the tool of choice for solving life-cycle models with risks and constraints, and it can be useful to contrast them with the approach described in this section. With dynamic programming, the problem is to solve the Bellman equation $V\lpar W_{t} \rpar \equals \mathop {{\rm max}}\limits_{c_{t} } \lsqb u\lpar c_{t} \rpar \plus \beta p_{t} E\lsqb V_{t \plus \setnum{1}} \lpar W_{t \plus \setnum{1}} \rpar \rsqb \rsqb $ subject to the budget constraint in (11), which is essentially a discretized version of our earlier problem of maximizing L(c_t). Since the functional form for the value function V_t ₊₁(W_t ₊₁) is not known, it must be estimated numerically by backward induction for a set of values, and then interpolated between these points. With this interpolated function, the simplest approach to solve the Bellman equation is to test every possible value of c_t: this is a slow process and has limited precision unless a fine grid is used. The circularity issue mentioned previously also makes the process more time consuming.

Although several numerical techniques can be used to perform a more efficient search for optimal consumption, their application is not straightforward with discontinuities. For instance, most assume the existence of a single optimum or solution within a given interval, but Figure 1 illustrated that this premise can fail. Furthermore, techniques based on first-order-conditions or related Euler equations have to be modified because the solution in (20) and (21) showed that several cases have to be considered. In other words, applying these techniques to problems with discontinuities requires some adjustments and an analytical process such as the one in Section 3.2 can be followed to determine the nature of the changes.

The technique used in this paper is particularly interesting because the solution is based on a known budget constraint instead of an unknown value function, which eliminates the need for interpolation and leads to exact values. In addition, it is not necessary to numerically optimize $c_{t}\ast $ as the solution is given in (20) and (21). Arguably, the approach suggested here is limited in the sense that it is based on a one-time risk, but the same concepts could be extended to multiple risky periods. With M possible risky paths, the crux of the problem would be to solve a system of M equations (the budget constraints, possibly as a series of closed-form expressions) and M unknowns λ₁, …, λ_M.Footnote ¹³ If this system can be solved in a reasonable amount of time, then the approach can be an interesting alternative to dynamic programming. With a one-time risk, the problem proved to be particularly manageable as it can be reduced to a one-equation-in-one-unknown problem that can be solved with the bisection method.

4 Assumptions for numerical illustrations

We use the equations detailed in Appendix B and the assumptions detailed in this section to generate the numerical illustrations in the remainder of this paper. The individual starts the problem at age t ₀ = 25 and lives up to age ω = 105. The model uses a discrete mortality table, which is calibrated with survival probabilities derived from the National Center for Health Statistics 2005 data.Footnote ¹⁴ The baseline parameters for respectively the power utility function, the rate of return, and the time discount factor are given by: γ = 3, r = 3%, and β = 3%.

As in Cocco et al. (Reference Cocco, Gomes and Maenhout2005), the income profiles are calibrated using data from the Panel Study of Income Dynamics (PSID) for three education levels: less than high school (LHS), high school (HS), and college.Footnote ¹⁵ Figure 2 illustrates the income profiles for each of the three education groups. Social Security benefits are computed according to the formula applicable in 2010.Footnote ¹⁶ The annual benefits are respectively $13,239 for the less-than-high-school group, $16,481 for the high-school group, and $23,278 for those with a college degree. For the pension income assumption, the use of a baseline ‘average’ scenario is more problematic because many workers have no pension at all, while others have very generous defined benefit plans. To reflect this heterogeneity, the results are illustrated for three scenarios for y_R: no pension income, the highest pension income possible ($4,000 for LHS, $6,000 for HS, and $24,000 for college), and a mid-point.Footnote ¹⁷

Figure 2. Income profiles by education.

The payroll tax rate π is 7.65%. The tax brackets [B_k,B_k ₊₁) and marginal tax rates τ_k are taken from schedule X of the 2010 IRS 1040 form for singles, which is reproduced in Table 1. The standard deduction is $5,700 and there is an additional $1,400 deduction for singles above age 65. Since the personal exemption for a single person is $3,650, a total of E = $9,350 and E_R = $10,750 is excludible from taxable income. The base amounts for the taxation of Social Security benefits are X ¹ = $25,000 and X ² = $34,000, they vary over time according to (7). The tax brackets, deductions, and bend points in the Social Security benefits formula are assumed to increase with inflation in the future. The base amounts used in the taxation of Social Security benefits are fixed; an inflation rate of 3% is used to reflect their decrease in real terms. To reflect these differences, the results are presented for individuals who were respectively aged 25, 45, and 65 in 2010.

Table 1. Marginal tax rates

Source: Schedule X of the 2010 IRS 1040 form for singles.

5 Endogenous tax rates and optimal consumption patterns

With standard problems, marginal tax rates are exogenously given. In contrast, the rates that apply to traditional accounts in this model are endogenously determined and can vary with time. To give the reader a sense of these rates, the bottom part of Figure 3 graphs the marginal tax rates τ_k (straight lines) and the effective marginal tax rates τ_k(1+M_h) (dashed lines) for the traditional account solution as a function of age. Six representative cases with different education, cohort, and pension income are considered: Case 1 (no tax after retirement), Case 2 (few taxes after retirement), Case 3 (mixed rates after retirement), Case 4 (slightly higher taxes after retirement), Case 5 (much higher taxes after retirement), and Case 6 (slightly higher taxes after retirement).

Figure 3. Examples of optimal consumption patterns.

In our illustrations, the marginal tax rate before retirement is 15% for the LHS/HS groups and 25% for the college group. After retirement, effective marginal tax rates can be higher or lower and several rates can apply to the same person. The results in Figure 3 underscore the role played by the taxation of Social Security benefits: without it, Roth accounts would never be strictly preferred in our illustrations as the marginal tax rates τ_k either stay the same or decrease after retirement. When M_h is taken into account, in many cases the resulting effective rates are above the pre-retirement rates, making Roth accounts potentially attractive. The question then is who is affected by the taxation of Social Security benefits. Currently, it is mostly those with higher incomes – in our illustrations, those in the college group. However, Social Security's base amounts are not indexed for inflation, which means that the situation will eventually apply to those in the high-school group for younger cohorts.

To show the connection between marginal tax rates and consumption, the top portion of Figure 3 gives the optimal consumption patterns for the traditional and Roth cases. In the figure, the link is evident: consumption is higher when marginal tax rates are lower (and vice-versa). Although the optimal consumption patterns for the traditional and Roth cases are similar in the sense that savings start and end around the same ages, they differ in that the Roth solution is smooth and the traditional solution is more jagged.Footnote ¹⁸ As explained in Section 3, the solution goes through flat portions (at C_k or C_h) when there is a transition between two tax rates, which can be observed in Cases 2, 3, 5, and 6. When M_h changes from 0% to 85%, it is also possible to have a jump in consumption: Case 3 provides an example of this when consumption suddenly declines by $2,000 at age 82.

6 Who gains by choosing Roth accounts?

If an individual can only invest in a traditional or a Roth account, which one of the two is most beneficial? To answer that question, the first column of Table 2 presents a dollar measure of the welfare gains/losses that traditional accounts generate over their Roth counterparts. Appendix C details the equation used for that computation and Table 2 gives the results for each of the education groups (LHS, HS, and college), for three different cohorts (ages 25, 45, and 65 in 2010), and for three levels of pension income.Footnote ¹⁹

Table 2. Welfare gains/losses and related measures without tax risk

¹ This is the percent increase in marginal tax rates after retirement which would make the individual indifferent between the traditional and Roth accounts. N/A indicates that the individual does not pay taxes after retirement or so little that it would take an increase over 80% to make the individual indifferent.

² For the case where the individual invests all his savings in a Roth account, this column gives the present value of the increase in taxes that would result if withdrawals from Roth accounts would count in the taxation of Social Security benefits.

The results in Table 2 indicate that in most cases choosing traditional accounts over Roths generates a welfare gain. The exceptions occur mainly when the taxation of Social Security benefits pushes the effective marginal tax rates after retirement above their pre-retirement levels, as illustrated in Cases 3–6 of Figure 3.Footnote ²⁰ As discussed in the previous section, this mostly affects those in the college group for the current cohort of retirees, but will eventually affect those in the high-school group for the youngest generation. For example, Roth accounts are preferred by those who are currently 25 years old with y_R = $6,000.Footnote ²¹ Those in the youngest generation who still prefer traditional accounts are also affected by the taxation of Social Security benefits as it cuts their welfare gains by about half.

6.1 Breakeven increase in future tax rates

Since an increase in future tax rates would favor Roth accounts, it is interesting to ask: What is the magnitude of the increase needed to change the traditional account recommendation? To answer that question, recall that the model in Section 2 allows for a multiplicative increase θ that applies to all marginal tax rates after retirement. Using this framework, we solve for the breakeven increase θ that would make the individual indifferent between traditional and Roth accounts and present the results in the second column of Table 2. Note that the breakeven rates are negative for those who initially prefer Roth accounts. The table indicates ‘N/A’ for those who pay no (or very little) taxes after retirement since changes in future marginal tax rates are not an issue for them, they will prefer traditional accounts no matter what. Future tax changes, however, are relevant for those with a college degree and in a few of the high-school cases. Table 2 shows that for them, breakeven rates range between 4% and 41%. Those with high pensions have low breakeven rates and can be affected by even small changes in tax rates. In contrast, those with a college degree and no pension income are less sensitive and would require a more substantial increase of 41% (age 45 in 2010) or 33% (age 25 in 2010) to justify the switch to Roth accounts. To put this in perspective, a 33% increase would change the marginal tax rates in Table 1 to 0%, 13%, 20%, 33%, 37%, 44%, and 47%.

Although Roth accounts are immune to increases in future marginal tax rates, it should be acknowledged that they are subject to different risks. For example, a change in the tax law could include withdrawals from Roth accounts in the taxation of Social Security benefits. To illustrate the magnitude of this issue, the third column of Table 2 gives the resulting increase in the present value of taxes for those who save with Roth accounts. This amount can be quite substantial, reaching up to $18,443. If this change was expected with certainty, Roth accounts would never be optimal in our illustrations. In addition, Roth accounts would lose some of their appeal for younger cohorts if Social Security's base amounts become indexed for inflation. Of course, substantial tax reforms would also affect the comparison, for example Kotlikoff et al. (Reference Kotlikoff, Marx and Rapson2008) evaluate the impact of a change to a consumption tax. Unfortunately, most strategies for retirement savings are not truly risk-free as long as the tax code can change and it is difficult to assign a probability distribution to these changes.

6.2 Naïve diversification strategies

Discussions of tax risk in the context of retirement savings are often accompanied by a suggestion to diversify among saving vehicles. For example, in a Vanguard document, Ahern et al. (Reference Ahern, Americks, Dickson, Nestor and Utkus2005) state: ‘Pre-tax savings are more beneficial if a participant is in a lower tax bracket in retirement; Roth savings are more beneficial if a participant is in a higher bracket. In a world of uncertain future tax rates, participants should diversify. Just as they hold fixed income assets to diversify the risks of stocks, so participants should hold Roth savings to diversify the risks associated with pre-tax savings’. To investigate the potential benefits of having both types of accounts, we use the naïve diversification strategy defined in Section 2 where the individual allocates a constant proportion α of savings/withdrawals to the traditional account and a proportion 1 − α to the Roth account. Of course, this strategy is limited in the sense that the individual cannot adjust α in every period, but it gives us a simple platform to assess the welfare gains stemming from risk reduction. In the next subsection, we will discuss how mixed strategies can be improved over naïve ones.

Before evaluating the risk reduction benefits associated with a mixed strategy, we must first recognize that this approach can increase welfare even in the absence of risk. To understand why, consider the following example where the marginal tax rates before and after retirement are respectively 15% and 10%, but withdrawals in excess of $10,000 trigger the taxation of Social Security benefits and are subject to an effective marginal tax rate of 18.5%. An individual who has to make annual withdrawals of $15,000 can use a naïve strategy with α = 2/3 and withdraw $10,000 from a traditional account and $5,000 from a Roth account. This strategy is beneficial because it allows the individual to gain from the lower tax rate of 10%, while avoiding the higher rate of 18.5%.

For the case without tax risk, we compute the welfare gains associated with each α from 0 to 1 and thus are able to find the optimal α. The results are presented in the fourth and fifth columns of Table 2 for each of the cases considered previously. Figure 4 also gives a graphical representation of the welfare gains as a function of α for three representative cases. Those who do not encounter issues with the taxation of Social Security find it optimal to allocate 100% of savings to traditional accounts. A 100% Roth strategy is optimal for those whose effective marginal tax rates after retirement are higher than before retirement even before making any withdrawal. For those who are in the situation described in the previous paragraph (about 40% of our cases), it is optimal to divert some (but not all) savings to the Roth account to avoid the higher marginal tax rates. Note that the welfare gains with the optimal α are always non-negative: losses with traditional accounts can be avoided when there is an option to allocate part of savings to a Roth account. The value of the option to invest a fraction of savings in Roth accounts can be computed by taking the difference between the two welfare gains in Table 2: the last column of Table 2 shows that this value is on average $1,800 when positive, with a maximum of $4,518.Footnote ²²

Figure 4. Welfare gains with naïve diversification strategies (over Roth accounts) for selected cases (age 45 in 2010).

We now introduce tax variability with a simple no-drift scenario where marginal tax rates after retirement can go up or down by 20%. Using Section 2's notation, this translates into p ₁=p ₂ = 50%, θ₁ = 120%, and θ₂ = 80%. Figure 4 contrasts the welfare gains for the cases with risk (dashed lines) and without tax risk (solid lines). For those in the LHS/HS education groups, the lines coincide and tax risk has essentially no impact on welfare gains. For those in the college group, we observe a small difference and tax risk reduces welfare by at most $409 when α = 100%. The loss attributable to tax risk diminishes with diversification, for example it is cut by $281 when α = 50%. However, Figure 4 shows that risk reduction benefits are not the only consideration when choosing α: gains or losses in the scenario without risk must also be taken into account. In the previous example with α = 50%, diversification is suboptimal because the certain loss ($1,580) is much higher than the risk reduction benefits ($281).

More generally, in most of our illustrations the magnitude of the risk reduction benefit that a given α brings is much smaller than the corresponding gain or loss in the no-risk case. Indeed, the optimal proportions allocated to traditional accounts in Table 2 are essentially the same for the cases with and without tax risk. Although intuitive at first, the analogy with a diversified portfolio of stocks and bonds does not translate well because traditional and Roth accounts can have great differences in expected values, which dominate the volatility effect. On the other hand, the certainty case showed that the peculiar nature of the tax structure provides a new motivation for diversifying: a mix of traditional and Roth accounts can have a higher expected value than a linear combination of the pure cases.

6.3 Mixed strategies

The naïve diversification strategy suggested in the previous section can be fine-tuned to further improve welfare. For instance, instead of mixing both accounts in every period before retirement, the optimal strategy is likely to involve a switch between periods where contributions are either 100% traditional or 100% Roth.Footnote ²³ Unless there are major fluctuations in income, there should be relatively few transitions between the two accounts. Starting with the Roth account can be justified by lower marginal tax rates at that time or by concerns about early withdrawals and the 10% penalty tax.Footnote ²⁴ Conversely, starting with the traditional account is preferable when these are not an issue and marginal tax rates decline after retirement.Footnote ²⁵ If savings in the traditional account eventually reach a level such that the marginal tax rate after retirement exceeds the pre-retirement one, a permanent switch to Roth accounts would be recommended. Temporary moves to the Roth side could also be motivated in periods of income loss or with particularly high tax deductions. After retirement, the strategy would be a generalization of our earlier example with α = 2/3 to every period: withdrawals would be made from the traditional account first, and α would be chosen such that higher marginal tax rates are avoided.

7 Retirement savings

The next question investigated is whether tax deductible contributions increase retirement savings. If so, is it an income or a substitution effect? In other words, savings may increase either because they are augmented by a tax subsidy or because people sacrifice more consumption before retirement. For this discussion, a measure of gross retirement savings can be obtained by computing the accumulated value of savings as follows:

(23)

$${\rm Gross\ retirement\ savings} \equals W_{R} \equals \int_{T_{\setnum{1}} }^{R} {{\rm e}^{r\lpar R \minus t\rpar } s\lpar c_{t}\ast \rpar {\rm d}t}.$$

In the traditional account case, gross retirement savings are inflated in the sense that taxes will have to be paid on withdrawals. Accordingly, we also define a net measure of retirement savings where a ‘tax liability’ is deducted. This tax liability measures the present value of taxes attributable to withdrawals (s_t) and it can be obtained by taking the difference between taxes with and without withdrawals as follows:

(24)

$$\eqalign{ {\rm Tax}\;{\rm liability} \equals \tab \int_{R}^{\omega } {{\rm e}^{ \minus r\lpar t \minus R\rpar } \lsqb {\rm tax}_{t} \lpar s_{t} \rpar \minus {\rm tax}_{t} \lpar 0\rpar \rsqb {\rm d}t} \cr {\rm Net}\;{\rm retirement}\;{\rm savings} \equals \tab W_{R} \minus {\rm Tax}\;{\rm liability}{\rm.} \cr} $$

7.1 Increases in retirement savings

Table 3 presents the gross and net levels of retirement savings at age 65 for each scenario, assuming that the individual was aged 45 in 2010. By education group, these range from: $40,000–$100,000 (LHS), $60,000–$140,000 (HS), and $50,000–$350,000 (college). Not surprisingly, higher income translates into higher savings, whereas higher pensions reduce the need to accumulate wealth. The bottom part of Table 3 tests the sensitivity of these results to the following changes in parameters: a = 25, a = 45, r = 1%, r = 5%, β = 1, β = 5, γ = 1, γ = 5, a reduction of 25% in Social Security benefits, and an increase by $5,000 in the exemptions amounts E and E_R. All cases are considered in the analysis, but due to space limitations the tables present only the averages for all education/income categories. The results show that the level of retirement savings is very sensitive to the choice of parameters, being halved or almost doubled in some scenarios.

Table 3. Retirement savings at age 65 (in dollars; age 45 in 2010)

The more salient result in Table 3 is the marginal impact of traditional accounts on retirement savings. First, those in the less-than-high-school category increase their savings by about $15,000. In their case, the gross and net differences are the same because they do not have enough income after retirement to pay taxes. The differences for the high-school group are much smaller: the gross increase is about $6,000 and the net increase averages only $200. Finally, the college group displays the more striking results. In their case, the gross increase can be pretty substantial reaching up to $68,000. However, once the increase in the tax liability is considered, this gain mostly vanishes with an average of only $2,800. Testing the sensitivity of these results, we find that the meaningful increase in net retirement savings for the less-than-high-school group is generally robust. The high-school group exhibits more variable results, ranging from −$20,000 to $20,000. The college group displays a pattern mostly similar to the one observed in the top portion of Table 3: a high increase in gross retirement savings is experienced, but most of it goes away once the net values are considered.

7.2 Income and substitution effects

Traditional accounts are more effective at increasing savings in some cases than others. To understand why, this section suggests a breakdown for the increase in retirement consumption (or equivalently the increase in net retirement savings) into an income effect and a substitution effect. The income effect comes from the additional tax subsidy that tax deductible contributions generate. It can be computed as the difference in the lifetime value of taxes as follows:

(25)

$${\rm TS} \equals {\rm Tax}\;{\rm subsidy} \equals \int_{t_{\setnum{0}} }^{\omega } {{\rm e}^{ \minus r\lpar R \minus t\rpar } \lsqb {\rm tax}_{t}^{{\rm Roth}} \lpar s_{t} \rpar \minus {\rm tax}_{t}^{{\rm Trad}} \lpar s_{t} \rpar } \rsqb {\rm d}t.$$

It should be noted that retirement consumption does not increase by the entire amount of the tax subsidy, the wealth effect increases consumption in all periods, both before and after retirement.Footnote ²⁶ Introducing the notation $C_{t\comma T} \equals \int _{t}^{T} {\rm e}^{ \minus r\lpar R \minus s\rpar } c_{s}\ast {\rm d}s$, the proportion of the tax subsidy allocated to retirement consumption can be estimated with the following ratio:

(26)

$$q \equals {\rm Propensity}\;{\rm to}\;{\rm consume}\;{\rm after}\;{\rm retirement} \equals {{C_{R\comma \omega } } \over {C_{t_{\setnum{0}} \comma \omega } }}.$$

A reduction in tax rates after retirement not only creates a subsidy but also a substitution effect by lowering the relative price of post-retirement consumption. The value of the new savings before retirement associated with the substitution effect can be computed with ${\rm New}\;{\rm Savings} \equals C_{t_{\setnum{0}} \comma \omega }^{{\rm Roth}} \lpar q^{{\rm Trad}} \minus q^{{\rm Roth}} \rpar $. Accordingly, the total changes in pre- and post-retirement consumptionFootnote ²⁷ can be written as

(27)

$$\eqalign{ \tab {\rm Change}\;{\rm in}\;{\rm pre {\hbox {-}} retirement}\;{\rm consumption} \equals \lpar 1 \minus q^{{\rm Trad}} \rpar \cdot TS \minus {\rm New}\;{\rm Savings} \cr \tab \quad {\rm Change}\;{\rm in}\;{\rm retirement}\;{\rm consumption} \equals q^{{\rm Trad}} \cdot TS \plus {\rm New}\;{\rm Savings} \cr} $$

Table 4 presents the results of the decomposition for the change in retirement consumption along with the tax subsidies and the applicable effective marginal tax rates. The table includes the previous cases from Figure 3 and we will use them as representative examples. The average tax subsidy in Table 4 is about $10,000; since q ^Trad is generally around 20%, the average increase in retirement consumption attributable to the tax subsidy is only $2,000. By themselves, subsidies can be a relatively expensive way to generate a small increase in retirement consumption. For example, the increase is at best $9,240 in Case 6, but the total cost of the subsidy is $48,396.

Table 4. Increase in retirement consumption

The substitution effect can generate larger increases in retirement consumption than the income effect, but only for some groups. It works well for those who pay little or no taxes after retirement, for example in Cases 1 and 2 the associated increases in retirement consumption are $13,745 and $14,248. For those who pay meaningful taxes after retirement, the taxation of Social Security benefits again changes the cards and reduces the substitution effect's potential. In many cases, marginal tax rates increase and the substitution effect is actually negative. For example, in Case 5 the marginal tax rate after retirement jumps to 27.75% and retirement consumption is reduced by $13,214.

To conclude this section, it is interesting to put the tax subsidies in perspective by observing that they move in sync with the welfare gains in Table 2. In other words, welfare gains associated with the tax deductibility of contributions arise because they are paid for by tax subsidies. Tax deductible contributions do not generate a benefit above their cost: on average, welfare gains are $500 less than tax subsidies because consumption patterns are disrupted. Although tax deductible contributions benefit most people in partial equilibrium, it should be kept in mind that this is not necessarily the case in general equilibrium where tax subsidies have to be financed by an increase in other taxes.

8 Conclusion

The results in this paper have a number of interesting applications for financial planners and 401(k)s. For financial planners who advise clients on their Roth/traditional decision, it is a good news, bad news story. The bad news is that the case for which we can determine with more conviction that traditional accounts are superior is for those who have lower incomes (i.e. those who are less likely to seek advice.) For those with higher incomes and some pensions, the results are often not as clear cut – the good news is that welfare losses from making the wrong decision are not excessively high. Actually, we show that a mixed traditional/Roth strategy can improve welfare when it helps to avoid the higher marginal tax rates due to the taxation of Social Security benefits. In contrast, we find that naïve diversification strategies offer limited risk reduction benefits when tax risk takes only the form of variability in future tax rates.

Our findings have also applications in the realm of 401(k)s: recently, employees were given the possibility of directing their 401(k) contributions to a Roth account, but employer matching contributions can only go to a traditional account. As some employees can lose with traditional accounts, our results suggest that extending the Roth opportunity to employer contributions would benefit these employees. Moreover, default strategies are becoming increasingly important in 401(k) plans. Although these have focused on contribution levels and asset allocation, it would also be interesting to consider which type of retirement account (traditional or Roth) should be set as a default option. This paper illustrated results for a wide range of cases and offers a starting point for this type of analysis. In particular, this paper hopes to raise awareness in terms of the underappreciated role played by the taxation of Social Security benefits.

This paper also offers new developments from a methodological perspective. By exploiting a dual approach, we illustrate how an analytical framework can be retained even when incorporating elements such as borrowing constraints, risks, and discontinuities. The analytical approach also provides valuable new insight by showing that the solution can be based on a system of known budget constraint equations. The benefit of this alternative formulation over more conventional dynamic programming is twofold: it produces exact values and eliminates the time-consuming process of estimating an unknown value function with numerical optimization and backward induction.

Finally, this paper found that the differences between Roth and traditional accounts are limited to some groups within the context of a standard life-cycle model analysis. The interesting question then is whether there are also practical considerations that affect the comparison. For example, the tax deductibility of contributions may be associated with behavioral effects that lead people to save more than they would with Roth accounts. That could be the case if the value of the immediate tax refund looms larger than the associated tax liability in the decision. Another intriguing issue is the fact that the tax deductibility of contribution can increase gross savings significantly. This is advantageous for an individual who is able to achieve superior investment returns. Similarly, having more assets under management is obviously beneficial for the industry. On the other hand, postponing tax receipts might not be desirable for cash-strapped governments. In addition, unsophisticated investors are prone to investment mistakes and this problem is leveraged with larger assets. These potential issues are left for future research.

Appendix A: Technical details for the solution of Section 2.3's optimization problem

A.1 Standard Lagrangian and dual approach

To prove that a solution to an optimization problem with constraints is optimal, the standard approach is to start by constructing a Lagrangian function where each constraint is multiplied by its Lagrange multiplier and the result appended to the objective function as follows:

(A.1)

$$L \equals E\left[ {\int_{t_{\setnum{0}} }^{\omega } {f\hskip 1\lpar t\rpar u\lpar c_{t} \rpar {\rm d}t} \plus \mu W_{\hskip-2pt t_{\setnum{0}} } \plus \int_{t_{\setnum{0}} }^{\omega } {\eta _{t} W_{\hskip-2pt t} {\rm d}t} } \right].$$

In (A.1), W_t is the wealth process and μ and η_t denote, respectively, the Lagrange multipliers for the budget constraint and the borrowing constraint at time t. In state i after retirement, p_iμ_i and $p_{i} \eta _{t}^{i} $ are used instead. To reduce clutter, we omit the subscripts i in this Appendix unless necessary. With this notation defined, an optimal solution must satisfy the following four Karush–Kuhn–Tucker (KKT) necessary conditions for all t: (1) $W_{\setnum{0}}\ast \equals 0$ and $W_{t}\ast \geqslant 0$, (2) $\eta _{t} \geqslant 0$, (3) $\eta _{t} W_{t}\ast \equals 0$, and (4) $L'\lpar c_{t}\ast \rpar \equals 0$. As mentioned in Section 3, the solution $c_{t}\ast $ to $L'\lpar c_{t}\ast \rpar \equals 0$ gives us little insight in terms of showing that $W_{t}\ast \geqslant 0$ everywhere.

To overcome this problem, the dual approach suggested in Lachance (Reference Lachance2012) is used and adapted to handle tax risk and discontinuities. Following He and Pages (Reference He and Pages1993) and applying integration by parts,Footnote ²⁸ the Lagrangian in (A.1) can be rewritten in terms of a process X(t) as follows:

$$L \equals E\left[ {\int_{t_{\setnum{0}} }^{\omega } {\left( {f\hskip 1\lpar t\rpar u\lpar c_{t} \rpar \plus X\lpar t\rpar {\rm e}^{ \minus r\lpar t \minus t_{\setnum{0}} \rpar } s_{t} \lpar c_{t} \rpar } \right){\rm d}t} } \right]\comma $$

where

(A.2)

$$X\lpar t\rpar \equals E\left[ {\mu \plus \int_{t}^{\omega } {\eta _{s} } {\rm d}s} \right]\;{\rm for}\;t \lt R\quad {\rm and}\quad {\rm \ }X^{i} \lpar t\rpar \equals \mu _{i} \plus \int_{t}^{\omega } {\eta _{s}^{i} {\rm d}s} \;{\rm for}\;t\geqslant R.$$

With this form, the dual approach can be applied as a two-step process: (1) find $c_{t}\ast $ that unconditionally maximizes the Lagrangian and (2) substitute the result in L to find the process X(t) that minimizes L.Footnote ²⁹ The advantage of this formulation is that it does not require that we show that $W_{t}\ast \geqslant 0$ everywhere. The first step is easy because we solve an unconstrained problem instead of a constrained one; the solution for $c_{t}\ast $ is given in Section 3.2 and is not repeated here. The next section explains how to solve for X(t) in the second step.

A.2 Solution for X(t)

Binding periods: In periods where the borrowing constraint is binding, the individual consumes as much as he can and $c_{t}\ast \equals \bar{y}_{t} \equals y_{t} \minus {\rm tax}_{t} \lpar 0\rpar $. Substituting this in equation (17) and replacing λ by X(t), we can invert the only possible solution X(t) = λ(t) where

(A.3)

$$\lambda \lpar t\rpar \equals u'\lpar \bar{y}_{t} \rpar f\lpar t\rpar {\rm e}^{r\lpar t \minus t_{\setnum{0}} \rpar } \lpar 1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar \rpar.$$

Note that binding periods can only happen in periods when λ(t) is decreasing since X(t) must be decreasing in these periods.

Non-binding periods: Within a period where $W_{t}\ast \gt 0$, by construction X(t) must be constant and we denote this by X(t) = λ.Footnote ³⁰ With tax risk, a different constant λ_i is used in each state i after retirement. If there is more than one non-binding period, a different (and decreasing) constant λ would be used in each separate period. For simplicity, we use a single period [T ₁,T ₂] below, but the same concepts would apply with multiple periods.

Connection points between periods: At the connection points T ₁ and T ₂, the solution is generally continuous and λ(T ₁) = λ = λ(T ₂).Footnote ³¹ With risk, this condition becomes:

(A.4)

$$\lambda \lpar T_{\setnum{1}} \rpar \equals \lambda \equals \mathop{\sum}\limits_{i \equals \setnum{1}}^{N} {p_{i} \lambda _{i} \equals E\lsqb \lambda \lpar T_{\setnum{2}} \rpar \rsqb } \quad {\rm where}\quad \lambda _{t} \equals \lambda \lpar T_{\setnum{2}}^{i} \rpar.$$

Within intervals where λ(t) is strictly decreasing, this condition can be used to express T ₁ and T ₂ as inverse functions T ₁(λ) and T ₂(λ). For the special case where T ₁=t ₀, the condition λ(T ₁) = λ becomes λ ⩾ λ(t ₀).

Budget constraint and X(t) that minimizes L: For each state i = 1, …, N, the budget constraint can be written as the present value of savings over the interval $\lsqb T_{\setnum{1}} \comma T_{\setnum{2}}^{i} \rsqb $ with

(A.5)

$$W\lpar \lambda \comma \lambda _{i} \rpar \equals \int_{T_{\setnum{1}} \lpar \lambda \rpar }^{R} {{\rm e}^{ \minus r\lpar t \minus T_{\setnum{1}} \lpar \lambda \rpar \rpar } s\lpar c_{t}\ast \lpar \lambda \rpar \rpar {\rm d}t} \plus \int_{R}^{T_{\setnum{2}}^{i} \lpar \lambda _{i} \rpar } {{\rm e}^{ \minus r\lpar t \minus T_{\setnum{1}} \lpar \lambda \rpar \rpar } s\lpar c_{t}\ast \lpar \lambda _{i} \rpar \rpar {\rm d}t} \equals 0.$$

Restricting the processes X(t) to those that satisfy the budget constraint, the second component in L is zero. L becomes:

(A.6)

$$L \equals E\left[ {\int_{t_{\setnum{0}} }^{T_{\setnum{1}} \lpar \lambda \rpar } {f\hskip 1\lpar t\rpar u\lpar \bar{y}_{t} \rpar {\rm d}t} \plus \int_{T_{\setnum{1}} \lpar \lambda \rpar }^{T_{\setnum{2}} \lpar \lambda \rpar } {f\hskip 1\lpar t\rpar u\lpar c_{t}\ast \lpar \lambda \rpar \rpar {\rm d}t} \plus \int_{T_{\setnum{2}} \lpar \lambda \rpar }^{\omega } {f\hskip 1\lpar t\rpar u\lpar \bar{y}_{t} \rpar {\rm d}t} } \right]$$

and we can show that dL/dλ < 0. Since L decreases with λ, the criteria ‘X(t) that minimizes L’ reduces to choosing the process with the highest λ among those that satisfy the budget constraint. The next section gives a practical algorithm to solve for λ.

A.3 Algorithm to solve for λ

To develop a practical algorithm to solve for λ, the key is to formulate the problem as an equation g(λ) = 0 to which the bisection method can be applied. Recall that for an interval λ∊(λ₁, λ₂), the intermediate value theorem guarantees the existence of a unique solution to g(λ) = 0 if g′(λ) > 0 for all λ∊(λ₁, λ₂), g(λ₁) < 0, and g(λ₂) > 0. From the previous section, our problem is to find values λ and $\lambda _{i} $ such that the N budget constraint equations W(λ, λ_i) = 0 are satisfied, the condition $\lambda \equals \sum _{i \equals \setnum{1}}^{N} p_{i} \lambda _{i} $ is met, λ ⩾ λ(t ₀) if T ₁=t ₀, λ = λ(T ₁) if T ₁>t ₀, λ_i = λ(T ₂ⁱ), and λ′(t) < 0 for all t∊[t ₀,T ₁)∪[T ₂ⁱ,ω)]. For a given λ, retirement savings can be expressed as a function W_R(λ). When W_R(λ) > 0, the budget constraint equation W(λ, λ_i) = 0 allows us to express λ_i as an inverse function λ_i(λ) with $\lambda _{i} ^{\prime } \lpar \lambda \rpar \lt 0$.Footnote ³² If there is only one period $\lsqb t_{\setnum{0}} \comma \tilde{t}\hskip2pt\rpar $ before retirement where λ′(t) < 0, we can define a value $\bar{\lambda } \in \lpar \lambda \lpar \tilde{t}\rpar \comma \infty \rpar $ such that retirement wealth is positive for all $\lambda \gt \bar{\lambda }$.Footnote ³³ This also allows us to define an inverse function T ₁(λ) with λ′(t) < 0 for all t∊[t ₀,T ₁).

Our problem can then be formulated as a single equation $g\lpar \lambda \rpar \equals \lambda \minus \sum _{i \equals \setnum{1}}^{N} p_{i} \lambda _{i} \lpar \lambda \rpar \equals 0$ with g′(λ) > 0. The range $\lpar \lambda _{\setnum{1}} \comma \lambda _{\setnum{2}} \rpar \equals \lpar \bar{\lambda }\comma \infty \rpar $ of values of λ to consider can be divided into two sub-intervals $\lpar \bar{\lambda }\comma \lambda \lpar t_{\setnum{0}} \rpar \rpar $ and [λ(t ₀),∞) where respectively the case T ₁>t ₀ applies if g(λ(t ₀)) > 0 and the case T ₁=t ₀ applies if g(λ(t ₀)) ⩽ 0. For the case T ₁>t ₀, when g(λ(t ₀)) > 0 there exists a unique solution $\lambda \in \lpar \bar{\lambda }\comma \lambda \lpar t_{\setnum{0}} \rpar \rpar $ to g(λ) = 0 since we can show that $g\lpar \bar{\lambda }\rpar \lt 0$. For the case T ₁=t ₀, when g(λ(t ₀)) ⩽ 0 there exists a unique solution λ∊[λ(t ₀),∞) to g(λ) = 0 since we can show that g(∞) > 0.Footnote ³⁴

Appendix B: Closed-form equations for the budget constraint

This appendix suggests a set of realistic assumptions to express the budget constraint as a series of closed-form equations. As is customary in this literature, the power utility function u(c)=c ^1 − γ/(1 − γ), γ≠1 is used. For the mortality and income assumptions, we opt for functional forms that are flexible enough to fit any discrete mortality table or income profile. For mortality, it is assumed that a constant force of mortality μ_j applies at each age j. For the pre-retirement income function, we assume that the income process is continuous and that it grows at a rate g_j at age j. Let J(t)=j if j⩽t < j + 1, the survival probability function for J(t)>t ₀ is given by $p_{t_{\setnum{0}} \comma t} \equals {\rm e}^{ \minus \sum _{{l \equals t_{\setnum{0}} }}^{{J\lpar t\rpar \minus \setnum{1}}} \mu _{l}\minus \mu _{{J\lpar t\rpar }} \lpar t \minus J\lpar t\rpar \rpar } $ and the pre-retirement income process by $y_{t} \equals y_{J\lpar t\rpar } {\rm e}^{g_{j} \lpar t \minus J\lpar t\rpar \rpar } $.

The budget constraint W(λ, λ_i) = 0 in equation (A.5) is equal to the present value of savings over the interval $\lsqb T_{\setnum{1}} \comma T_{\setnum{2}}^{i} \rsqb $. From equations (13) and (4), savings are given by

(B.1)

$$s\lpar c_{t}\ast \lpar \lambda \rpar \rpar \equals \left\{ {\matrix{ {{{y_{t} \minus \pi y_{t} \minus \tau _{k} \lpar y_{t} \minus E\rpar \minus G_{k} \minus c_{t}\ast \lpar \lambda \rpar } \over {1 \minus \alpha \tau _{k} }}\comma } \hfill \tab {t \lt R\comma } \hfill \cr {{{y_{R} \plus SS \minus \tau _{k} \lpar y_{R} \plus M_{h} \lpar y_{R} \plus SS\sol 2\rpar \plus H_{t\comma h} \minus E_{R} \rpar \minus G_{k} \minus c_{t}\ast \lpar \lambda \rpar } \over {1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar }}\comma } \hfill \tab {t\geqslant R.} \hfill \cr} } \right.$$

Note that in equation (B.1), $c_{t}\ast \lpar \lambda \rpar $ can take different forms: it can be the interior solution given in equation (22) or it can be C_k or C_h given in equations (14) and (15). Thus, the budget constraint is the sum of the present value of savings over a series of sub-intervals [t,T] where $c_{t}\ast \lpar \lambda \rpar $ takes the same form. For each interval [t,T], the present value of savings can be expressed in closed form if we can integrate the present values of the functions y_t, $c_{t}\ast \lpar \lambda \rpar $, H_t,_h, $B_{t\comma h}^{S} $, and some constants, which can easily be done as follows:

(B.2)

$$\int_{t}^{T} {{\rm e}^{ \minus r\lpar t -T_{\setnum{1}} \rpar } {\rm d}t \equals {\rm e}^{rT_{\setnum{1}} } a_{t\comma T}^{r} } \comma \quad {\rm where}\quad a_{t\comma T}^{x} \equals \lpar {\rm e}^{ \minus xt} \minus {\rm e}^{xT} \rpar \sol x\comma $$

(B.3)

$$\int_{t}^{T} {{\rm e}^{ \minus r\lpar t \minus T_{\setnum{1}} \rpar } y_{t} {\rm d}t} \equals {\rm e}^{rT_{\setnum{1}} } \mathop{\sum}\limits_{j \equals J\lpar t\rpar }^{J\lpar T\rpar } {{\rm e}^{ \minus g_{j} j} y_{j} a_{j.t\comma T}^{g_{j} \minus r} } \comma \quad {\rm where}\;a_{j\comma t\comma T}^{x} \equals \lpar {\rm e}^{x{\rm min}\lpar T\comma j \plus \setnum{1}\rpar } \minus {\rm e}^{x{\rm max}\lpar t\comma j\rpar } \rpar \sol x\comma $$

(B.4)

$$\int_{t}^{T} {{\rm e}^{ \minus r\lpar t \minus T_{\setnum{1}} \lpar \lambda \rpar \rpar } c_{t}\ast \lpar \lambda \rpar {\rm d}t \equals } {\rm e}^{rT_{\setnum{1}} } \left( {{{\lambda {\rm e}^{\lpar r \minus \beta \rpar t_{\setnum{0}} } } \over {1 \minus \alpha \tau _{k} \lpar 1 \plus M_{h} \rpar }}} \right)^{ \minus \setnum{1}\sol \gamma } \mathop{\sum}\limits_{j \equals J\lpar t\rpar }^{J\lpar T\rpar } {\lpar p_{t_{\setnum{0}} \comma j} {\rm e}^{\mu _{j} j} \rpar ^{\setnum{1}\sol \gamma } a_{j\comma t\comma T}^{ \minus r \plus \lpar r \minus \beta \minus \mu _{j} \rpar \sol \gamma } } \comma $$

(B.5)

$$\int_{t}^{T} {{\rm e}^{ \minus r\lpar t \minus T_{\setnum{1}} \rpar } } H_{t\comma h} {\rm d}t \equals \left\{ {\matrix{ {{\rm e}^{rT_{\setnum{1}} \plus ia} a_{t\comma T}^{r \plus i} H_{a\comma h} \comma } \hfill \tab {h \equals 1\comma 2\comma } \hfill \cr {85\percnt {\rm SS} \cdot {\rm e}^{rT_{\setnum{1}} } a_{t\comma T}^{r} \comma } \hfill \tab {h \equals 3\comma } \hfill \cr} } \right.$$

(B.6)

$$\int_{t}^{T} {{\rm e}^{ \minus r\lpar t \minus T_{\setnum{1}} \rpar } } B_{t\comma h} {\rm d}t \equals \left\{ {\matrix{ {{\rm e}^{rT_{\setnum{1}} \plus ia} a_{t\comma T}^{r \plus i} B_{a\comma h}^{S} \comma } \hfill \tab {h \equals 1\comma 2\comma } \hfill \cr {{\rm SS} \cdot {\rm e}^{rT_{\setnum{1}} } a_{t\comma T}^{r} \plus {\rm e}^{rT_{\setnum{1}} \plus ia} a_{t\comma T}^{r \plus i} \lpar 0.5X_{a}^{\setnum{1}} \plus 0.35X_{a}^{\setnum{2}} \rpar \sol 0.85\comma } \hfill \tab {h \equals 3.} \hfill \cr} } \right.$$

These results can be combined with equations (14), (15), and (B.1) to obtain the present value of savings over an interval [t,T]. By multiplying them by ${\rm e}^{r\lpar R \minus T_{\setnum{1}} \rpar } $, they can also be used to compute the retirement savings, tax liability, and tax subsidy measures in Section 7.

Appendix C: Value function

The welfare gain is computed by first determining the percentage increase in consumption over the interval [T ₁,T ₂] that would make the individual indifferent between the Roth and traditional cases. With the exception of the interval [T ₁,T ₂], this approach is equivalent to that presented in other works such as Cocco et al. (Reference Cocco, Gomes and Maenhout2005).Footnote ³⁵ To obtain a dollar measure, the percentage increase is multiplied by the value of consumption $C_{T_{\setnum{1}} \comma T_{\setnum{2}} }^{{\rm Roth}} $. Let $V_{T_{\setnum{1}} \comma T_{\setnum{2}} } \equals \int _{T_{\setnum{1}} }^{T_{\setnum{2}} } f\hskip 1\lpar t\rpar u\lpar c_{t}\ast \rpar {\rm d}t$ denote the value function, the welfare measure can be expressed as

(C.1)

$${\rm Welfare}\;{\rm gain\sol loss}\;{\rm with}\;{\rm traditional}\;{\rm account\ } \equals {{V_{T_{\setnum{1}} \comma T_{\setnum{2}} }^{{\rm Trad}\; \setnum{1}\sol \lpar \setnum{1} \minus \gamma \rpar } \minus V_{T_{\setnum{1}} \comma T_{\setnum{2}} }^{{\rm Roth}\; \setnum{1}\sol \lpar \setnum{1} \minus \gamma \rpar } } \over {V_{T_{\setnum{1}} \comma T_{\setnum{2}} }^{{\rm Roth}\, \setnum{1}\sol \lpar \setnum{1} \minus \gamma \rpar } }}C_{T_{\setnum{1}} \comma T_{\setnum{2}} }^{{\rm Roth}}.$$

Footnotes

The author would like to thank an anonymous referee, Nikhil Varaiya, and Stefano Gubellini for helpful comments.

¹ See Dammon (Reference Dammon2009) for an analysis of the conversion decision.

² Roth accounts have other advantages outside of this comparison. For instance, with traditional accounts the 10% penalty tax for early withdrawals applies to the entire withdrawal. For Roth accounts, the penalty is smaller as it applies only to the investment income portion. Roth accounts do not require minimum distributions starting at age 70½ and are perceived as better instruments to pass money tax-free to heirs. Another point raised in Burman et al. (Reference Burman, Gale and Weiner2001) is that back-loaded options might be preferable when equal contributions limits apply because they shelter more money for retirement.

³ These multipliers do not apply to those with either very low or very high incomes. Note that the idea that the taxation of Social Security benefits can increase lifetime taxes with 401(k)s was introduced in Gokhale et al. (Reference Gokhale, Kotlikoff and Neumann2001). This paper's contribution is to incorporate this concept within the analytical solution of a life-cycle model.

⁴ This is for the case with standard deductions and no tax penalty. Roth accounts could also be recommended for those who fall in a lower tax bracket because of especially high deductions or a temporary loss of income.

⁵ He shows that Roth accounts are superior when tax rates do not decrease after retirement as they are less risky. As the spread between pre- and post-retirement tax rates increases, a greater proportion of savings should be allocated to traditional accounts.

⁶ In the scenario without risk, those who gain from a mixed strategy are those who are subject to higher marginal tax rates after retirement and are able to reduce these rates by taking smaller withdrawals from traditional accounts.

⁷ The previous literature (e.g. Engen et al. Reference Engen, Gale and Scholz1994; Laibson et al. Reference Laibson, Repetto and Tobachman1998; Gomes et al. Reference Gomes, Michaelides and Polkovnichenko2009) has examined the impact of traditional accounts over taxable accounts. Our results are not directly comparable as traditional accounts are benchmarked against Roth accounts instead of taxable accounts, effectively measuring the differential impact of the tax deductibility of contributions.

⁸ The model does not currently include State taxes because of the large number of cases to consider. For those who pay taxes before retirement only, including State taxes would imply higher tax subsidies and a greater incentive to save for retirement. The impact would be more limited for those who pay taxes after retirement if marginal tax rates increase by the same amount before and after retirement.

⁹ There is a maximum of 50% of Social Security benefits that apply to this taxable amount. This rule is not incorporated in the model.

¹⁰ In (14), h is such that C_k∊(C_h,C_h ₊₁); in (15), k is such that y_t ^tx(C_h)∊[B_k,B_k ₊₁). C_k and C_h are defined only when positive.

¹¹ In Λ_t(c), k is such that y_t ^tx(st(c))∊[B_k, B_k ₊₁) and h is such that ${\rm PI}\lpar s_{t} \lpar c\rpar \rpar \in \lsqb B_{t\comma h}^{S} \comma B_{t\comma h \plus \setnum{1}}^{S} \rpar $.

¹² Let φ(λ)=L(c ₁,λ)−L(c ₂,λ) and $\overline{\rmLambda } $ be such that $\phi \lpar \overline{\rmLambda } \rpar \equals 0\comma $ since φ′(λ) > 0, c ₁ maximizes L when $\lambda \gt \bar{\rmLambda }$ and c ₂ maximizes L when $\lambda \lt \bar{\rmLambda }$. For C=C_h ₌₃, it can be shown that φ(Λ(C⁻)) < 0 and φ(Λ(C⁻)) > 0, which guarantees the existence of a unique $\bar{\rmLambda } \in \lpar \rmLambda \lpar C^{ \minus } \rpar \comma \rmLambda \lpar C\rpar \rpar $.

¹³ While a formal extension to the case with multiple risky periods is left for future research, we can give the intuition here by adding one more risky period. For example, if after each state i there are J possible states j = 1, …, J, there would be an additional set of N·J constants λ_i,j and N conditions $\lambda _{i} \equals \sum _{j \equals \setnum{1}}^{J} p_{j} \lambda _{i\comma j} $. The budget constraint would take the form W(λ,λ_i,λ_i,j) = 0 and the problem would reduce to a system of N·J=M equations in M unknowns λ_i,j because λ and λ_i can be expressed in terms of λ_i,_j.

¹⁴ The survival probabilities are derived using standard actuarial techniques and adding the number of deaths for males and females to obtain unisex rates.

¹⁵ Those with some college but no degree are not included in these categories. The estimation includes PSID waves from 1970 to 2007. Results are based on median values for the household head's labor income variable, with an adjustment to 2010 dollars using the Consumer Price Index for urban wage earners and clerical workers. The data includes households headed by both males and females whose employment status is either working now, temporarily laid-off, or unemployed looking for work. A third-degree polynomial is fitted to the results. The resulting income profiles are similar to those reported in 2010 dollars in Brown et al. (Reference Brown, Fang and Gomes2011).

¹⁶ The PIA (Primary Insurance Amount) is 90% of the AIME (Average Indexed Monthly Earnings) up to $761, plus 32% of the excess up to $4,856, and 15% of the excess over $4,856. The AIME is computed by taking the average of the highest 35 years of income indexed with average growth in wages. For simplicity, we assume that this is equal to inflation. The PIA is reduced by 6.66% because the individual retires at age 65 and it is assumed that the full retirement age is 66.

¹⁷ The incentive to save for retirement can be lost if the combination of pension income with Social Security is such that replacement rates are too high. The ‘highest pension income possible’ was determined by finding the highest y _R such that the problem still had a solution with positive retirement savings.

¹⁸ For example, for someone age 45 in 2010 with no pension income, the respective values of T ₁ in the Roth and traditional cases are: 33.8 versus 33.7 (LHS), 33.9 versus 34.5 (HS), and 34.8 versus 35.2 (college). For T ₂, the corresponding values are 88.1 versus 89.1 (LHS), 89.5 versus 90.6 (HS), and 91.8 versus 93.1 (college). For other cases, differences are typically within a year.

¹⁹ It should be noted that in practice, those in the older cohorts did not have access to traditional and Roth accounts for their entire career.

²⁰ Traditional accounts can be optimal in some cases with higher marginal tax rates because different marginal tax rates can apply to withdrawals.

²¹ At the same time, those with a college degree and y_R = $24,000 no longer prefer the Roth because they reached the maximum amount of taxable benefits and return to regular tax rates of 15% and 25%.

²² These figures are a lower bound for the value of the option given that we are using a naïve diversification strategy instead of optimizing α in every period.

²³ Issues with maximum contributions or eligibility could make partial strategies optimal.

²⁴ Although the 10% penalty tax applies to the entire amount of the withdrawal for a traditional account, for Roth accounts only the investment income portion is subject to the penalty.

²⁵ To see this, let τ₁ and τ₂ be the marginal tax rates at times t ₁ and t ₂. A dollar invested at time t ₁ and withdrawn at time t ₂ yields ${\rm e}^{r\lpar t_{\setnum{2}} \minus t_{\setnum{1}} \rpar } \lpar 1 \minus \tau _{\setnum{2}} \rpar \sol \lpar 1 \minus \tau _{\setnum{1}} \rpar $ in the traditional case and ${\rm e}^{r\lpar t_{\setnum{2}} \minus t_{\setnum{1}} \rpar } $ in the Roth case. If τ₂ < τ₁, it is advantageous to save in the traditional account for a longer period of time t ₂−t ₁.

²⁶ In the solution presented in Section 3, an increase in wealth reduces λ and consequently increases consumption in all periods.

²⁷ Note that this decomposition indicates that the impact of a policy on saving behavior cannot be asserted simply by examining the change in pre-retirement consumption as this amount also includes a tax subsidy portion. For example, Gomes et al. (Reference Gomes, Michaelides and Polkovnichenko2009) concluded that tax-deferred accounts do not promote a greater propensity to save because they increased pre-retirement consumption by 2% on average. If the 2% includes a tax subsidy, a positive substitution effect may be present but hidden.

²⁸ To write (A.2) in a form similar to He and Pages (Reference He and Pages1993), we substitute $\int _{t_{\setnum{0}} }^{t} {\rm e}^{r\lpar t \minus t_{\setnum{0}} \rpar } s_{t} {\rm d}t$ to W_t in (28).

²⁹ Under certain technical conditions (see Boyd and Vandenberghe, Reference Boyd and Vandenberghe2004), the value function with the dual approach is given by $V\ast \equals \mathop {{\rm inf}}\limits_{X \ \in \ {\rm D}} \mathop {{\rm sup}}\limits_{c \gt \setnum{0}} L\lpar c\rpar \equals \mathop {{\rm inf}}\limits_{X \ \in \ {\rm D}} L\lpar c\ast \rpar $ where ${\rm D}$ is the set of non-negative and decreasing functions.

³⁰ This follows from the third KKT condition in appendix A.1 that requires that η_t = 0 when $W_{t}\ast \gt 0$ and from the equation for X(t) in (A.2).

³¹ This result follows from two conditions: (i) since X(t) is decreasing, λ(T ₁⁻) ⩾ λ and λ ⩾ λ(T ₂)_, and (ii) to ensure that the individual is saving (dissaving) at the beginning (end) of [T ₁,T ₂], we must have λ > λ(T ₁⁺) and $\lambda \lt \lambda \lpar T_{\setnum{2}}^{ \minus } \rpar $. If λ(t) is continuous at T ₁ and T ₂, these conditions imply λ(T ₁) = λ = λ(T ₂). If there is a discontinuity at T ₁ or T ₂, the condition is adjusted to λ(T ₁⁻) ⩾ λ > (T ₁) or λ(T ₂⁻) ⩾ λ > (T ₂).

³² More precisely, W_R(λ) is the accumulated value of $s\lpar c_{t}\ast \lpar \lambda \rpar \rpar $ with interest over the interval [T ₁(λ),R). Setting W_R(λ) equal to the present value of $s\lpar c_{t}\ast \lpar \lambda _{i} \rpar \rpar $ over the interval [R,T ₂ⁱ), we can solve for λ_i with λ(T ₂ⁱ ⁻) > λ_i ⩾ λ(T ₂ⁱ). If there is only one period where λ(t) is decreasing after retirement, there is a unique solution to this equation. With multiple decreasing periods there can be more than one solution; in that case, the approach in Lachance (Reference Lachance2012) can be used and the optimal one is the one with the highest λ_i.

³³ Lachance (Reference Lachance2012) gives a detailed algorithm to handle the case with any number of periods where λ(t) is decreasing. This solution would apply before retirement until it remains only one period where λ(t) is decreasing.

³⁴ If $\bar{\lambda } \gt \lambda \lpar t_{\setnum{0}} \rpar $, only the case T ₁=t ₀ can apply and $g\lpar \bar{\lambda }\rpar \lt 0$ guarantees the existence of a unique solution $\lambda \in \lpar \bar{\lambda }\comma \infty \rpar $ to g(λ) = 0.

³⁵ The binding periods are excluded from the calculation because including them biases the welfare measure. To see this, consider that by construction, the marginal utility of consuming an additional dollar in the binding period [t ₀,T ₁] is higher than in the non-binding period [T ₁,T ₂]. As a result, when the binding period [t ₀,T ₁] is included in the welfare calculation, a smaller increase in consumption is required to match a given welfare gain. Including the binding period [T ₂,ω] creates the opposite problem as marginal utility is lower in that period than in [T ₁,T ₂].

References

Ahern, M., Americks, J., Dickson, J., Nestor, R. and Utkus, S. (2005) Tax diversification and the Roth 401(k). Vanguard Center for Retirement Research, 18.Google Scholar

Boyd, S. and Vandenberghe, L. (2004) Convex Optimization. Cambridge: Cambridge University Press.Google Scholar

Brown, J.R., Fang, C. and Gomes, F. (2011) Risk and returns to education: The value of human capital. Working Paper, University of Illinois at Urbana-Champaign.Google Scholar

Burman, L.E., Gale, W.G. and Weiner, D. (2001) The taxation of retirement saving: choosing between front-loaded and back-loaded options. National Tax Journal, 54(3): 689–702.Google Scholar

Butterfield, S.L., Jacobs, F.A. and Larkins, E.R. (2000) The Roth versus the traditional IRA: a comparative analysis. Journal of Applied Business Research, 16(4): 113–128.Google Scholar

Cocco, J., Gomes, F. and Maenhout, P. (2005) Consumption and portfolio choice over the life cycle. Review of Financial Studies, 18(2): 491–533.Google Scholar

Dammon, R.M. (2009) Retirement investing: analyzing the Roth conversion option. Tepper School of Business, Paper 104.Google Scholar

Dickson, J.M. (2004) Pension taxation and tax code risk. In Gale, E.G., Shoven, J.B., and Warshawsky, M.J. (eds). Private Pensions and Public Policies, Brookings Institution Press, 221–231.Google Scholar

Engen, E.M., Gale, W.G. and Scholz, J.K. (1994) Do savings incentive work? Brookings Papers on Economic Activity, 25(1): 85–180.Google Scholar

Gokhale, J., Kotlikoff, L.J. and Neumann, T. (2001) Does participating in a 401(k) raise your lifetime taxes? National Bureau of Economic Research, Working paper 8341.Google Scholar

Gomes, F., Michaelides, A. and Polkovnichenko, V. (2009) Optimal savings with taxable and tax-deferred accounts. Review of Economic Dynamics, 12: 718–735.CrossRef Google Scholar

He, H. and Pages, H.F. (1993) Labor income, borrowing constraints, and equilibrium asset prices. Economic Theory, 3: 663–696.Google Scholar

Kotlikoff, L.J., Marx, B. and Rapson, D. (2008) To Roth or not? That is the question. National Bureau of Economic Research, Working paper 13763.Google Scholar

Lachance, M.E. (2012) Optimal onset and exhaustion of retirement savings in a life-cycle model. Journal of Pension Economics and Finance, 11(1): 21–52.CrossRef Google Scholar

Laibson, D.I., Repetto, A. and Tobachman, J. (1998) Self-control and saving for retirement. Brookings Papers on Economic Activity, 91–172.Google Scholar

Love, D.A. (2007) What can the life-cycle model tell us about 401(k) contributions and participation? Journal of Pension Economics and Finance, 6(2): 1473–1492.Google Scholar

Yaari, M.E. (1965) Uncertain lifetime, life insurance, and the theory of the consumer. Review of Economic Studies, 32(3): 137–150.CrossRef Google Scholar

Figure 1. Illustration of jump in optimal consumption (optimal ct maximizes L(ct)).

Figure 2. Income profiles by education.

Table 1. Marginal tax rates

Figure 3. Examples of optimal consumption patterns.

Table 2. Welfare gains/losses and related measures without tax risk

Figure 4. Welfare gains with naïve diversification strategies (over Roth accounts) for selected cases (age 45 in 2010).

Table 3. Retirement savings at age 65 (in dollars; age 45 in 2010)

Table 4. Increase in retirement consumption

Article contents

Roth versus traditional accounts in a life-cycle model with tax risk*

Abstract

Keywords

1 Introduction

2 Model

2.1 Economic and demographic assumptions

2.2 Taxation

2.3 Optimization problem

2.4 Discontinuity points for τk and Mh

3 Solution

3.1 Lagrangian and dual approach

3.2 Optimal consumption and issues with discontinuities

3.3 Comparison with dynamic programming

4 Assumptions for numerical illustrations

5 Endogenous tax rates and optimal consumption patterns

6 Who gains by choosing Roth accounts?

6.1 Breakeven increase in future tax rates

6.2 Naïve diversification strategies

6.3 Mixed strategies

7 Retirement savings

7.1 Increases in retirement savings

7.2 Income and substitution effects

8 Conclusion

Appendix A: Technical details for the solution of Section 2.3's optimization problem

A.1 Standard Lagrangian and dual approach

A.2 Solution for X(t)

A.3 Algorithm to solve for λ

Appendix B: Closed-form equations for the budget constraint

Appendix C: Value function

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests

2.4 Discontinuity points for τ_k and M_h