1 Introduction
A number of recent developments have sparked renewed interest in Roth retirement accounts. First, increasing government deficits can lead to higher tax rates in the future, reducing the appeal of traditional tax-deferred Individual Retirement Arrangements (IRAs) and 401(k)s. Second, new provisions in the tax code encourage the conversion of traditional accounts into Roth accounts.Footnote 1 Starting in 2010, the $100,000 modified adjusted gross income (MAGI) limit for conversions has been removed and a special clause allows those who make the conversion in 2010 to split the proceeds equally between their 2011 and 2012 tax returns. Third, designated Roth accounts were introduced in the 401(k) arena in 2006. Although employer matching contributions are still restricted to traditional accounts, employees’ contributions can be allocated to a Roth version of a 401(k) or 403(b).
This paper formally examines the differences between the front-loaded (traditional) and back-loaded (Roth) approaches to retirement accounts within the context of a life-cycle model. Building on Yaari's (Reference Yaari1965) classic framework with uncertain lifetime and borrowing constraints, the following realistic tax features are added: tax brackets and deductions, taxation of Social Security benefits, impact of retirement contributions/withdrawals on taxable income, and tax risk after retirement. This paper contributes to the literature by providing an analytical solution to the model for three cases: traditional account only, Roth account only, and any fixed combination of traditional and Roth accounts.
The dual approach proposed in this paper is a departure from more conventional dynamic programming techniques used to solve complex problems. This approach has three advantages: it handles better the discontinuity issues introduced by the realistic tax structure, it produces exact values, and it is fast. Working with accurate results is particularly important when examining small differences between two tax systems. The disadvantage of dynamic programming solutions is that they are based on unknown value functions, which must be estimated with time-consuming numerical optimization and interpolation. The broader methodological contribution of this paper is to show that the solution can be extracted instead from a set of known budget constraint equations, which eliminates the need to estimate unknown value functions and yields exact values. Further, the structure of the solution allows us to express the budget constraint with a series of closed-form equations. In terms of limitations, the model's main drawback is that it does not currently incorporate sources of background risk such as income risk, risky assets returns, or medical expenses risk.
The solution derived in this paper allows us to investigate three questions: Who benefits from Roth accounts? How does tax risk impact on the comparison? How does a policy of tax-deductible contributions impact on retirement savings? To answer these questions, we follow an approach commonly used in this literature and illustrate the model's solution with realistic income profiles for three education groups: less than high school, high school, and college. We also add a new dimension to the results by presenting them for three age cohorts (ages 25, 45, and 65 in 2010) and different levels of pension income.
When comparing traditional and Roth accounts, the standard argument is that traditional front-loaded accounts are a better option when marginal tax rates decline after retirement (see e.g. Engen et al. Reference Engen, Gale and Scholz1994).Footnote 2 The conventional wisdom is that this is generally the case and accordingly, traditional accounts have been favored historically. For example, Butterfield et al. (Reference Butterfield, Jacobs and Larkins2000) conclude that ‘traditional IRAs have significant wealth accumulation advantages over Roth IRAs in all but rare circumstances’. Our solution shows that this is not necessarily the case when the following tax loophole is taken into account: withdrawals from traditional accounts (but not from Roth accounts) are considered when determining the amount of taxable Social Security benefits. For those affected, this implies that marginal tax rates after retirement effectively increase by either 50% or 85%.Footnote 3 Depending on the specific combination of tax rates, this either mitigates the tax decline after retirement or leads to an increase.
By contrasting expected utility in the pure Roth and traditional cases for the scenarios considered, we find that traditional accounts provide unambiguous benefits over their Roth counterparts mostly for those who pay no or little taxes after retirement.Footnote 4 For those at the other end of the spectrum with higher incomes and pensions, traditional accounts generate only small gains or losses due to the taxation of Social Security benefits. Currently, this issue affects mostly retirees in the group with a college degree, but the situation will be different for the younger generation as Social Security's taxation thresholds are not indexed for inflation.
Roth accounts may also become the preferred option if future tax rates increase. For instance, Kotlikoff et al. (Reference Kotlikoff, Marx and Rapson2008) showed that they would be more appealing in a 30% tax hike scenario. We perform a different exercise by solving for the breakeven percentage increase in future tax rates required to make individuals indifferent between Roth and traditional accounts. For the group who pays no or minimal taxes after retirement, traditional accounts are still optimal even with a significant increase in taxes. For those who pay meaningful taxes, we find that the breakeven increases range from 4% to 41%. Although a modest increase in future taxes could tip the scale in favor of Roth accounts for those with high pensions, a more radical change would be required for those without pensions.
To address concerns about tax risk, diversified strategies that mix traditional and Roth accounts have been commonly suggested. Yet, little formal evidence has been provided to support them. Dickson (Reference Dickson, Gale, Shoven and Warshawsky2004) is one of the rare models to include tax variability in a two-period model with both traditional and Roth accounts.Footnote 5 In this paper, we extend the concept of tax variability to a life-cycle framework, which allows us to measure the benefits of tax diversification in a realistic setup. We consider a simple naïve diversification strategy where, in every period, the individual allocates the same fraction of his savings/withdrawals to the traditional and Roth accounts. We solve for the optimal fraction to invest in traditional accounts in the cases with and without tax risk and find that the optimal allocation is essentially the same in both cases. In the scenario without tax risk, the option to invest a fraction of savings in Roth accounts can be worth a few thousand dollars because it can help avoid higher marginal tax rates. In contrast, risk reduction benefits have more limited impact on the allocation decision because they amount to a few hundred dollars at best.Footnote 6
Finally, our model allows us to examine the differential impact of traditional accounts on retirement savings and consumption.Footnote 7 Our results indicate that, although traditional accounts can increase gross retirement savings substantially for those with a college degree (by up to $68,000 in the baseline scenario), most of this increase vanishes once we take into account the present value of taxes that will have to be paid on withdrawals. In other words, for that group traditional accounts increase the size of assets under management, but not necessarily retirement consumption. Those in the less-than-high-school group present a different story as they do not pay taxes after retirement and actually display the highest increase in retirement consumption at $15,000. Decomposing the increase into an income effect (from the tax subsidy) and a substitution effect (from increased savings before retirement), we find that the substitution effect has more impact on retirement consumption because a large fraction of the tax subsidy is used to increase consumption before retirement.
The remainder of this paper is structured as follows. Section 2 describes the model and Section 3 provides the solution. Section 4 lists the assumptions used in the numerical illustrations and Section 5 illustrates some representative cases. Section 6 analyzes who gains from Roth accounts and considers the potential benefits of mixed strategies. Section 7 details the impact of tax deductible contributions on the level of retirement savings. Section 8 concludes with some suggestions for applications and future research.
2 Model
The model builds on Yaari's (Reference Yaari1965) classic life-cycle framework, which features borrowing constraints and an uncertain lifetime. The model's contribution to the life-cycle literature is to incorporate many realistic features of the tax treatment of retirement savings, while maintaining an analytical structure. First, the model can be used to model either Roth or traditional accounts. In the Roth case, contributions and withdrawals do not affect taxes. In the traditional case, contributions are deductible from taxable income and withdrawals are taxed. The accounts can be part of an IRA or a 401(k). Second, while many models use a single fixed tax rate, this model reflects the United States tax structure with various tax brackets and deductions. Third, the model lets tax rates be endogenously determined rather than being specified exogenously. Fourth, the model incorporates the actual rules for the taxation of Social Security benefits. Fifth, the model allows for tax risk after retirement.
Although it would also be interesting to let the individual choose between Roth and traditional accounts in each period, this setup would make the problem more cumbersome to solve. The model can be used, however, to offer a solution for the simpler case of a naïve diversification approach. With that strategy, the proportions allocated to the traditional and Roth accounts are, respectively, α and 1 − α in every period. This solution allows us to illustrate the potential value of diversification in a context with tax risk. In the remainder of the text, all equations are given as a function of α since the pure cases can be viewed as special cases as follows:

To balance the increased complexity brought by realistic taxes, straightforward assumptions are used for the rest of the model. This approach has the advantage of leading to a solution with exact values which lends itself well to analysis. For the elements not included in this model, the reader is referred to: Cocco et al. (Reference Cocco, Gomes and Maenhout2005) (income risk and portfolio choice), Love (Reference Love2007) (employer contributions), and Kotlikoff et al. (Reference Kotlikoff, Marx and Rapson2008) (married individuals).
2.1 Economic and demographic assumptions
The individual is age t 0 when the problem starts and he can live up to age ω. The probability that he survives from age t 0 to age t is denoted by $p_{t_{\setnum{0}} \comma t} $. It is assumed that
$p_{t_{\setnum{0}} \comma t} $ is continuous, decreases with time, and eventually converges to zero as t → ω. The utility of consumption is represented by an increasing and concave function u(c) with u′(c) > 0, u″(c) < 0,
$\mathop {{\rm lim}}\limits_{c \to \setnum{0}} u\prime \lpar c\rpar \equals \infty $, and
$\mathop {{\rm lim}}\limits_{c \to \infty } u\prime\lpar c\rpar \equals 0$. Time preferences are taken into account by discounting utility at a continuous rate β. The combined discount from time and mortality is denoted by

All economic assumptions are expressed in real terms. Savings grow at a real risk-free rate r > 0. The wealth process is denoted by Wt and it is assumed that the individual starts the problem with no initial savings, i.e. $W_{t_{\setnum{0}} } \equals 0$. Borrowing is not allowed. Before retirement, pre-tax income is a continuous function denoted by yt. The individual retires at age R, which is assumed to be 65 years old. After retirement, the individual receives a Social Security pension with annual payments of SS > 0. He may also receive annual income yR ⩾ 0 from an employer pension or another source of annuity payments, for a total of yt=yR + SS per year. The solution considers only the case where expected after-tax income declines at the time of retirement.
2.2 Taxation
Before retirement, income is subject to a payroll tax rate π. At all times, federal income taxes based on the United States tax bracket system apply.Footnote 8 For k = 0, …, K, the brackets are denoted by [Bk,Bk +1) and the marginal tax rate within bracket k is τk. Tax rates after retirement are subject to a one-time multiplicative shock θ that applies to all marginal tax rates. To represent this risk, the model allows choosing across the entire family of discrete probability distribution. Let N be the number of possible states, there is a probability pi with i = 1, …, N that θ = θi and $\tau _{k}^{i} \equals \theta _{i} \tau _{k} $. The case without tax risk can be viewed as a special case of this model with N = 1 and θ1 = 1. For the remainder of this paper, all functions of τk should be interpreted as random variables after retirement although the notation is not differentiated.
To compute taxable income, the standard deduction and personal exemption must be taken into account. The sum of these two components is denoted by E before age 65 and by ER after age 65 (there is an increase in the standard deduction at age 65). In the traditional case, it must also be recognized that taxable income is reduced by contributions before retirement and increased by withdrawals after retirement. The contributions or savings before retirement are denoted by st where st=yt−ct − taxt(st) and ct is consumption. A negative st represents a withdrawal. Let SSttx denote the taxable portion of Social Security benefits, the taxable income function is given by

Accordingly, for k = 0, …, K, the tax function is

where

It should be noted that some of the features of the tax system are not included in the model because they do not affect the solution in most cases illustrated here. For instance, the model does not limit the amount of tax deductible contributions because the optimal savings in all our illustrations are below the 401(k)’s $16,500 limit. In addition, the 10% penalty tax for early withdrawals before age 59½ is not incorporated because the model does not feature shocks that would trigger them. This eventuality is discussed later in Section 6.3.
The model is completed by specifying SSttx(st), the taxable portion of Social Security benefits. This is a fairly complex formula. First, a provisional income measure PIt ≡ PI(st) is defined by adding half of Social Security benefits to other sources of income (including withdrawals from traditional accounts, but not Roth accounts), yielding:

Once PIt is computed, it has to be compared to two ‘base amounts’ X 1 and X 2. Currently, these base amounts are X 1 = $25,000 and X 2 = $34,000 for singles; the corresponding measures are $32,000 and $44,000 for those married filing jointly. These amounts are not indexed for inflation. Given that the rest of the model is expressed in real terms, these thresholds have to be adjusted for inflation (denoted by i) and this creates cohort effects in the model. For example, for somebody retiring at age 65 in 2010, the first base amount in real terms is $25,000. For somebody currently age 45 in 2010, this same base amount will be $13,720 when they retire in 20 years (assuming i = 3%), making it more likely that they will be taxed. For a given individual, the base amounts will also decrease by e−it in real terms after t years of retirement. Thus, let a be the current age in 2010, the base amounts in real terms can be expressed by

If the provisional income PIt is less than the first base amount $X_{t}^{\setnum{1}} $, none of the Social Security benefits are taxable. If PIt is greater than the first base amount, but lower than the second, 50% of the excess PIt−X t1 is taxable income.Footnote 9 If PIt is greater than the second base amount, then the taxable portion is 50%(X t2−X t1) + 85%(PIt−X t2), subject to a maximum of 85% of Social Security benefits. Thus, depending on the level of PIt, there are four different formulas that can apply: 0, 50%(PIt−Xt 1), 50%(X t2−X t1) + 85%(PIt−X t2), and 85% SS. To summarize, these four cases can be embedded into one linear structure:

where

The key notation to remember for the rest of the analysis is Mh, which represents the marginal rate of inclusion of Social Security benefits in taxable income. From equations (6) and (8), a dollar withdrawn from a traditional account increases the taxable portion of Social Security benefits by Mh and triggers an additional tax of τkMh. Thus, Mh can be viewed as a factor that magnifies the marginal tax rate from τk to τk(1+Mh). For the remainder of this paper, τk(1+Mh) will be referred to as the effective marginal tax rate. For example, if τk = 15% and Mh = 85, withdrawals from traditional accounts are effectively taxed at a rate of 15% (1 + 85%) = 27.75%.
To put this in perspective, recall that the basis for favoring traditional accounts is that marginal tax rates after retirement should be lower than before retirement. Once the taxation of Social Security benefits is taken into account, this logic may no longer hold as it is possible to have lower income after retirement, but higher tax rates. This issue does not necessarily affect everyone, those with either very low or very high incomes are not impacted because their inclusion rate Mh is zero. It should be kept in mind that the composition of this group varies across cohorts because the base amounts change in real terms. Initially, those with higher incomes are subject to the 50% or 85% inclusion rates. Eventually, they will see their Mh decline to zero when their taxable benefits reach 85% SS. On the other hand, those with lower incomes start with Mh = 0 but ultimately see their inclusion rates increase to 50% and 85% as the base amounts decrease in real terms.
2.3 Optimization problem
Combining these assumptions, the individual's problem is to choose consumption to maximize his expected utility, subject to a budget constraint and borrowing constraints. Thus, the optimization problem is

such that

and

where taxt(st) is given in (4) and st(ct) by

for $B_{k} \leqslant y_{t}^{{\rm tx}} \lpar s_{t} \rpar \lt B_{k \plus \setnum{1}} $ and
$B_{t\comma h}^{S} \leqslant {\rm PI}\lpar s_{t} \rpar \lt B_{t\comma h \plus \setnum{1}}^{S} $ with
$y_{t}^{{\rm tx}} \lpar s_{t} \rpar $ given in (3) and PI(st) given in (6).
2.4 Discontinuity points for τk and Mh
The rates τk and Mh in this model are step functions of consumption and these discontinuities affect the structure of the solution in the next section. To facilitate the exposition of the solution, the notations Ck and Ch are introduced to identify the levels of consumption where respectively the marginal tax rate jumps from τk− 1 to τk and the rate of inclusion of Social Security benefits jumps from Mh− 1 to Mh. We can solve for Ck by setting taxable income equal to the next tax bracket (i.e. $y_{t}^{{\rm tx}} \lpar s\lpar C_{k} \rpar \rpar \equals B_{k} $) and inverting Ck from equation (13) as follows:Footnote 10

Similarly, Ch is derived by setting the provisional income in equation (6) equal to the next threshold $B_{t\comma h}^{S} $ to obtain:

3 Solution
We derive a complete solution and proof for the optimization problem in (10)–(13). This section outlines the solution's key components and we refer to Appendix A for additional technical details. Readers less interested in the derivation of the results can proceed to the illustrations in Section 4. In a problem with borrowing constraints, the binding periods with $W_{t} \ast \equals 0$ must be identified because the solution is different in these periods as the individual simply consumes all his after-tax income. To simplify the exposition in this section, we consider the case where the individual starts saving for retirement at age T 1 and exhausts his savings by age T 2, i.e. we have the following structure:
3.1 Lagrangian and dual approach
Solving the problem with conventional methods can be challenging because we need to prove that the borrowing constraint is satisfied everywhere. Without restricting the model's assumptions, this is a difficult task because the optimal consumption function provides little insight into the sign of the wealth process. In this context, it is useful to turn to the dual approach suggested in Lachance (Reference Lachance2012) and extend it to include discontinuities and risks. Dual approaches are used with problems that are difficult to solve in their primal form, but are easier to handle in their equivalent dual form. In this case, the advantage is that we do not need to prove directly that $W_{t}\ast \geqslant 0$ everywhere and this allows us to get a simpler condition.
As with conventional optimization methods, to prove that a solution is optimal the problem's Lagrangian must be derived first. Appendix A.1 describes how the standard Lagrangian is constructed and rewrites it in a form suitable for the dual approach as follows:

The process X(t) is just a transformation of the original Lagrange multipliers; it must be decreasing during the binding periods and equal to a constant λ during the non-binding period [T 1, T 2]. With tax risk after retirement, a different constant λi applies in each state i. By construction, the constants are related by $\lambda \equals \sum _{i \equals \setnum{1}}^{N} p_{i} \lambda _{i} $ and can be interpreted as the marginal utility of wealth. Less technically, these conditions on X(t) mean that expected utility cannot be improved by saving an additional dollar at time t and spending it later (and vice-versa).
The first step of the dual approach is to find $c_{t}\ast $ that maximizes L, which is a simple unconstrained maximization problem detailed in Section 3.2. The second step of the dual approach is to find X(t) that minimizes L, which in our setup is equivalent to:
• Solving for the constants λ and λi that satisfy the budget constraint equations W(λ, λi) = 0, i = 1, …, N and the condition
$\lambda \equals \sum _{i \equals \setnum{1}}^{N} p_{i} \lambda _{i} $.
• If there is more than one constant λ or λi that satisfies the budget constraint, the highest one is optimal.
A solution that satisfies these criteria maximizes utility and satisfies all the problem's constraints. Appendix A.2 details the equation for the budget constraint W(λ, λi) = 0 and Appendix A.3 explains how the bisection method can be employed to solve for the constants λ and λi.
3.2 Optimal consumption and issues with discontinuities
To find $c_{t}\ast $ that maximizes L unconditionally when X(t) = λ, the standard procedure is to derive the first-order condition (F.O.C.) as follows:

The first component in L′(ct) measures the marginal benefit of consumption and the second one the opportunity cost of saving where s t′(ct) = −1/(1 − ατk(1+Mh)). With traditional accounts, savings do not decrease one-for-one with consumption because taxes create an additional reward (or penalty for withdrawals). In that case, higher taxes translate into higher savings before retirement and lower withdrawals after retirement.
When L′(ct) is continuous, L″(ct) < 0 and there is a unique solution to the F.O.C. $L\prime\lpar c_{t}\ast \rpar \equals 0$. When τk and Mh enter L′(ct), this cannot be taken as granted because the discontinuities raise new technical issues: (1) the F.O.C. may not have a solution
$c_{t}\ast $ for every λ, (2) the F.O.C. can have more than one solution
$c_{t}\ast $ for a given λ, and (3) circularity, i.e. τk is a function of
$c_{t}\ast $ and
$c_{t}\ast $ is a function of τk. Note that these discontinuity issues can also arise in retirement problems where retirement savings are rewarded (or withdrawals penalized) and the reward/penalty can suddenly change. For example, this would be the case if an employer offers a 50% match on employee contributions, but stops the match for contributions above 6%.
For the first issue, recall that Section 2.4 defined the discontinuity points Ck and Ch where τk and Mh jump. Here, the notation C will stand for any of these points. The problem is that it is not possible to solve for c in L′(c) = 0 when at a discontinuity point C we have L′(C−) > 0 and L′(C) < 0. Actually, with L″(c) < 0, C is locally optimal because L(c) increases up to C and decreases thereafter. Since the equation for L′(C) in (17) is a function of λ, it will be convenient to rewrite the condition in terms of a range of values for λ as follows:

where the function

is simply introduced to express the solution more compactly later on.Footnote 11 Outside these values for λ, the standard F.O.C. solution in (17) applies. This approach with brackets for λ offers a simple test to determine when an algorithm should use the standard F.O.C. solution in (17) and when it should use the discontinuity points C. It has also the advantage of addressing the circularity issue: for a given λ, we know which τk or Mh to use in $c_{t}\ast \lpar \lambda \rpar $.
The second problem that we can encounter is having more than one locally optimal solution. This happens when L′(ct) increases at a discontinuity, i.e. in the uncommon scenario where the effective marginal tax rate decreases with consumption. In this model, this occurs at the point Ch =3 where taxable Social Security benefits reach the 85% maximum and Mh drops from 85% to 0%. Specifically, when $L'\lpar C_{h \equals \setnum{3}}^{ \minus } \rpar \lt 0$ and L′(Ch =3) > 0, there are two locally optimal solutions c 1<Ch =3 and c 2>Ch =3 (with h<3 for c 1 and h = 3 for
$c_{\setnum{2}} $). The globally optimal one maximizes L(ct) and to express this more systematically, we solve for a point
$\lambda \equals \bar{\rmLambda }$ such that the individual is indifferent between c 1 and c 2. It is possible to show that c 1 is optimal for all
$\lambda \gt \bar{\rmLambda }$, c 2 is optimal for all
$\lambda \lt \bar{\rmLambda }$, and the solution jumps between c 1 and c 2 when
$\lambda \equals \bar{\rmLambda }$.Footnote 12Figure 1 makes this result more intuitive by illustrating the solution graphically for the cases
$\lambda \lt \bar{\rmLambda }$,
$\lambda \equals \bar{\rmLambda }$, and
$\lambda \gt \bar{\rmLambda }$.

Figure 1. Illustration of jump in optimal consumption (optimal ct maximizes L(ct)).
Combining these results, the optimal solution in the non-binding period after retirement t∊[R, T 2) can be condensed with:

for $\lambda \geqslant \bar{\rmLambda }$ and

for $\lambda \lt \bar{\rmLambda }$. Before retirement, the points Ch related to Social Security do not apply and the solution in the non-binding period [T 1,R) is given by (21) for all λ. In the pure Roth case (α = 0), only the top part of the solution applies. In other words, working with Roth accounts in optimization problems is much easier than working with traditional accounts.
If the utility function takes the standard power utility form $u\lpar c\rpar \equals c^{\setnum{1} \minus \gamma } \sol \lpar 1 \minus \gamma \rpar $ with γ≠1, the top portion of the solution in (20) and (21) becomes:

Appendix B shows how this equation can be combined with a flexible set of assumption for mortality and income to express the budget constraint W(λ, λi) = 0 as a series of closed-form equations.
3.3 Comparison with dynamic programming
In the previous literature, dynamic programming techniques have been the tool of choice for solving life-cycle models with risks and constraints, and it can be useful to contrast them with the approach described in this section. With dynamic programming, the problem is to solve the Bellman equation $V\lpar W_{t} \rpar \equals \mathop {{\rm max}}\limits_{c_{t} } \lsqb u\lpar c_{t} \rpar \plus \beta p_{t} E\lsqb V_{t \plus \setnum{1}} \lpar W_{t \plus \setnum{1}} \rpar \rsqb \rsqb $ subject to the budget constraint in (11), which is essentially a discretized version of our earlier problem of maximizing L(ct). Since the functional form for the value function Vt +1(Wt +1) is not known, it must be estimated numerically by backward induction for a set of values, and then interpolated between these points. With this interpolated function, the simplest approach to solve the Bellman equation is to test every possible value of ct: this is a slow process and has limited precision unless a fine grid is used. The circularity issue mentioned previously also makes the process more time consuming.
Although several numerical techniques can be used to perform a more efficient search for optimal consumption, their application is not straightforward with discontinuities. For instance, most assume the existence of a single optimum or solution within a given interval, but Figure 1 illustrated that this premise can fail. Furthermore, techniques based on first-order-conditions or related Euler equations have to be modified because the solution in (20) and (21) showed that several cases have to be considered. In other words, applying these techniques to problems with discontinuities requires some adjustments and an analytical process such as the one in Section 3.2 can be followed to determine the nature of the changes.
The technique used in this paper is particularly interesting because the solution is based on a known budget constraint instead of an unknown value function, which eliminates the need for interpolation and leads to exact values. In addition, it is not necessary to numerically optimize $c_{t}\ast $ as the solution is given in (20) and (21). Arguably, the approach suggested here is limited in the sense that it is based on a one-time risk, but the same concepts could be extended to multiple risky periods. With M possible risky paths, the crux of the problem would be to solve a system of M equations (the budget constraints, possibly as a series of closed-form expressions) and M unknowns λ1, …, λM.Footnote 13 If this system can be solved in a reasonable amount of time, then the approach can be an interesting alternative to dynamic programming. With a one-time risk, the problem proved to be particularly manageable as it can be reduced to a one-equation-in-one-unknown problem that can be solved with the bisection method.
4 Assumptions for numerical illustrations
We use the equations detailed in Appendix B and the assumptions detailed in this section to generate the numerical illustrations in the remainder of this paper. The individual starts the problem at age t 0 = 25 and lives up to age ω = 105. The model uses a discrete mortality table, which is calibrated with survival probabilities derived from the National Center for Health Statistics 2005 data.Footnote 14 The baseline parameters for respectively the power utility function, the rate of return, and the time discount factor are given by: γ = 3, r = 3%, and β = 3%.
As in Cocco et al. (Reference Cocco, Gomes and Maenhout2005), the income profiles are calibrated using data from the Panel Study of Income Dynamics (PSID) for three education levels: less than high school (LHS), high school (HS), and college.Footnote 15Figure 2 illustrates the income profiles for each of the three education groups. Social Security benefits are computed according to the formula applicable in 2010.Footnote 16 The annual benefits are respectively $13,239 for the less-than-high-school group, $16,481 for the high-school group, and $23,278 for those with a college degree. For the pension income assumption, the use of a baseline ‘average’ scenario is more problematic because many workers have no pension at all, while others have very generous defined benefit plans. To reflect this heterogeneity, the results are illustrated for three scenarios for yR: no pension income, the highest pension income possible ($4,000 for LHS, $6,000 for HS, and $24,000 for college), and a mid-point.Footnote 17

Figure 2. Income profiles by education.
The payroll tax rate π is 7.65%. The tax brackets [Bk,Bk +1) and marginal tax rates τk are taken from schedule X of the 2010 IRS 1040 form for singles, which is reproduced in Table 1. The standard deduction is $5,700 and there is an additional $1,400 deduction for singles above age 65. Since the personal exemption for a single person is $3,650, a total of E = $9,350 and ER = $10,750 is excludible from taxable income. The base amounts for the taxation of Social Security benefits are X 1 = $25,000 and X 2 = $34,000, they vary over time according to (7). The tax brackets, deductions, and bend points in the Social Security benefits formula are assumed to increase with inflation in the future. The base amounts used in the taxation of Social Security benefits are fixed; an inflation rate of 3% is used to reflect their decrease in real terms. To reflect these differences, the results are presented for individuals who were respectively aged 25, 45, and 65 in 2010.
Table 1. Marginal tax rates

Source: Schedule X of the 2010 IRS 1040 form for singles.
5 Endogenous tax rates and optimal consumption patterns
With standard problems, marginal tax rates are exogenously given. In contrast, the rates that apply to traditional accounts in this model are endogenously determined and can vary with time. To give the reader a sense of these rates, the bottom part of Figure 3 graphs the marginal tax rates τk (straight lines) and the effective marginal tax rates τk(1+Mh) (dashed lines) for the traditional account solution as a function of age. Six representative cases with different education, cohort, and pension income are considered: Case 1 (no tax after retirement), Case 2 (few taxes after retirement), Case 3 (mixed rates after retirement), Case 4 (slightly higher taxes after retirement), Case 5 (much higher taxes after retirement), and Case 6 (slightly higher taxes after retirement).

Figure 3. Examples of optimal consumption patterns.
In our illustrations, the marginal tax rate before retirement is 15% for the LHS/HS groups and 25% for the college group. After retirement, effective marginal tax rates can be higher or lower and several rates can apply to the same person. The results in Figure 3 underscore the role played by the taxation of Social Security benefits: without it, Roth accounts would never be strictly preferred in our illustrations as the marginal tax rates τk either stay the same or decrease after retirement. When Mh is taken into account, in many cases the resulting effective rates are above the pre-retirement rates, making Roth accounts potentially attractive. The question then is who is affected by the taxation of Social Security benefits. Currently, it is mostly those with higher incomes – in our illustrations, those in the college group. However, Social Security's base amounts are not indexed for inflation, which means that the situation will eventually apply to those in the high-school group for younger cohorts.
To show the connection between marginal tax rates and consumption, the top portion of Figure 3 gives the optimal consumption patterns for the traditional and Roth cases. In the figure, the link is evident: consumption is higher when marginal tax rates are lower (and vice-versa). Although the optimal consumption patterns for the traditional and Roth cases are similar in the sense that savings start and end around the same ages, they differ in that the Roth solution is smooth and the traditional solution is more jagged.Footnote 18 As explained in Section 3, the solution goes through flat portions (at Ck or Ch) when there is a transition between two tax rates, which can be observed in Cases 2, 3, 5, and 6. When Mh changes from 0% to 85%, it is also possible to have a jump in consumption: Case 3 provides an example of this when consumption suddenly declines by $2,000 at age 82.
6 Who gains by choosing Roth accounts?
If an individual can only invest in a traditional or a Roth account, which one of the two is most beneficial? To answer that question, the first column of Table 2 presents a dollar measure of the welfare gains/losses that traditional accounts generate over their Roth counterparts. Appendix C details the equation used for that computation and Table 2 gives the results for each of the education groups (LHS, HS, and college), for three different cohorts (ages 25, 45, and 65 in 2010), and for three levels of pension income.Footnote 19
Table 2. Welfare gains/losses and related measures without tax risk

1 This is the percent increase in marginal tax rates after retirement which would make the individual indifferent between the traditional and Roth accounts. N/A indicates that the individual does not pay taxes after retirement or so little that it would take an increase over 80% to make the individual indifferent.
2 For the case where the individual invests all his savings in a Roth account, this column gives the present value of the increase in taxes that would result if withdrawals from Roth accounts would count in the taxation of Social Security benefits.
The results in Table 2 indicate that in most cases choosing traditional accounts over Roths generates a welfare gain. The exceptions occur mainly when the taxation of Social Security benefits pushes the effective marginal tax rates after retirement above their pre-retirement levels, as illustrated in Cases 3–6 of Figure 3.Footnote 20 As discussed in the previous section, this mostly affects those in the college group for the current cohort of retirees, but will eventually affect those in the high-school group for the youngest generation. For example, Roth accounts are preferred by those who are currently 25 years old with yR = $6,000.Footnote 21 Those in the youngest generation who still prefer traditional accounts are also affected by the taxation of Social Security benefits as it cuts their welfare gains by about half.
6.1 Breakeven increase in future tax rates
Since an increase in future tax rates would favor Roth accounts, it is interesting to ask: What is the magnitude of the increase needed to change the traditional account recommendation? To answer that question, recall that the model in Section 2 allows for a multiplicative increase θ that applies to all marginal tax rates after retirement. Using this framework, we solve for the breakeven increase θ that would make the individual indifferent between traditional and Roth accounts and present the results in the second column of Table 2. Note that the breakeven rates are negative for those who initially prefer Roth accounts. The table indicates ‘N/A’ for those who pay no (or very little) taxes after retirement since changes in future marginal tax rates are not an issue for them, they will prefer traditional accounts no matter what. Future tax changes, however, are relevant for those with a college degree and in a few of the high-school cases. Table 2 shows that for them, breakeven rates range between 4% and 41%. Those with high pensions have low breakeven rates and can be affected by even small changes in tax rates. In contrast, those with a college degree and no pension income are less sensitive and would require a more substantial increase of 41% (age 45 in 2010) or 33% (age 25 in 2010) to justify the switch to Roth accounts. To put this in perspective, a 33% increase would change the marginal tax rates in Table 1 to 0%, 13%, 20%, 33%, 37%, 44%, and 47%.
Although Roth accounts are immune to increases in future marginal tax rates, it should be acknowledged that they are subject to different risks. For example, a change in the tax law could include withdrawals from Roth accounts in the taxation of Social Security benefits. To illustrate the magnitude of this issue, the third column of Table 2 gives the resulting increase in the present value of taxes for those who save with Roth accounts. This amount can be quite substantial, reaching up to $18,443. If this change was expected with certainty, Roth accounts would never be optimal in our illustrations. In addition, Roth accounts would lose some of their appeal for younger cohorts if Social Security's base amounts become indexed for inflation. Of course, substantial tax reforms would also affect the comparison, for example Kotlikoff et al. (Reference Kotlikoff, Marx and Rapson2008) evaluate the impact of a change to a consumption tax. Unfortunately, most strategies for retirement savings are not truly risk-free as long as the tax code can change and it is difficult to assign a probability distribution to these changes.
6.2 Naïve diversification strategies
Discussions of tax risk in the context of retirement savings are often accompanied by a suggestion to diversify among saving vehicles. For example, in a Vanguard document, Ahern et al. (Reference Ahern, Americks, Dickson, Nestor and Utkus2005) state: ‘Pre-tax savings are more beneficial if a participant is in a lower tax bracket in retirement; Roth savings are more beneficial if a participant is in a higher bracket. In a world of uncertain future tax rates, participants should diversify. Just as they hold fixed income assets to diversify the risks of stocks, so participants should hold Roth savings to diversify the risks associated with pre-tax savings’. To investigate the potential benefits of having both types of accounts, we use the naïve diversification strategy defined in Section 2 where the individual allocates a constant proportion α of savings/withdrawals to the traditional account and a proportion 1 − α to the Roth account. Of course, this strategy is limited in the sense that the individual cannot adjust α in every period, but it gives us a simple platform to assess the welfare gains stemming from risk reduction. In the next subsection, we will discuss how mixed strategies can be improved over naïve ones.
Before evaluating the risk reduction benefits associated with a mixed strategy, we must first recognize that this approach can increase welfare even in the absence of risk. To understand why, consider the following example where the marginal tax rates before and after retirement are respectively 15% and 10%, but withdrawals in excess of $10,000 trigger the taxation of Social Security benefits and are subject to an effective marginal tax rate of 18.5%. An individual who has to make annual withdrawals of $15,000 can use a naïve strategy with α = 2/3 and withdraw $10,000 from a traditional account and $5,000 from a Roth account. This strategy is beneficial because it allows the individual to gain from the lower tax rate of 10%, while avoiding the higher rate of 18.5%.
For the case without tax risk, we compute the welfare gains associated with each α from 0 to 1 and thus are able to find the optimal α. The results are presented in the fourth and fifth columns of Table 2 for each of the cases considered previously. Figure 4 also gives a graphical representation of the welfare gains as a function of α for three representative cases. Those who do not encounter issues with the taxation of Social Security find it optimal to allocate 100% of savings to traditional accounts. A 100% Roth strategy is optimal for those whose effective marginal tax rates after retirement are higher than before retirement even before making any withdrawal. For those who are in the situation described in the previous paragraph (about 40% of our cases), it is optimal to divert some (but not all) savings to the Roth account to avoid the higher marginal tax rates. Note that the welfare gains with the optimal α are always non-negative: losses with traditional accounts can be avoided when there is an option to allocate part of savings to a Roth account. The value of the option to invest a fraction of savings in Roth accounts can be computed by taking the difference between the two welfare gains in Table 2: the last column of Table 2 shows that this value is on average $1,800 when positive, with a maximum of $4,518.Footnote 22

Figure 4. Welfare gains with naïve diversification strategies (over Roth accounts) for selected cases (age 45 in 2010).
We now introduce tax variability with a simple no-drift scenario where marginal tax rates after retirement can go up or down by 20%. Using Section 2's notation, this translates into p 1=p 2 = 50%, θ1 = 120%, and θ2 = 80%. Figure 4 contrasts the welfare gains for the cases with risk (dashed lines) and without tax risk (solid lines). For those in the LHS/HS education groups, the lines coincide and tax risk has essentially no impact on welfare gains. For those in the college group, we observe a small difference and tax risk reduces welfare by at most $409 when α = 100%. The loss attributable to tax risk diminishes with diversification, for example it is cut by $281 when α = 50%. However, Figure 4 shows that risk reduction benefits are not the only consideration when choosing α: gains or losses in the scenario without risk must also be taken into account. In the previous example with α = 50%, diversification is suboptimal because the certain loss ($1,580) is much higher than the risk reduction benefits ($281).
More generally, in most of our illustrations the magnitude of the risk reduction benefit that a given α brings is much smaller than the corresponding gain or loss in the no-risk case. Indeed, the optimal proportions allocated to traditional accounts in Table 2 are essentially the same for the cases with and without tax risk. Although intuitive at first, the analogy with a diversified portfolio of stocks and bonds does not translate well because traditional and Roth accounts can have great differences in expected values, which dominate the volatility effect. On the other hand, the certainty case showed that the peculiar nature of the tax structure provides a new motivation for diversifying: a mix of traditional and Roth accounts can have a higher expected value than a linear combination of the pure cases.
6.3 Mixed strategies
The naïve diversification strategy suggested in the previous section can be fine-tuned to further improve welfare. For instance, instead of mixing both accounts in every period before retirement, the optimal strategy is likely to involve a switch between periods where contributions are either 100% traditional or 100% Roth.Footnote 23 Unless there are major fluctuations in income, there should be relatively few transitions between the two accounts. Starting with the Roth account can be justified by lower marginal tax rates at that time or by concerns about early withdrawals and the 10% penalty tax.Footnote 24 Conversely, starting with the traditional account is preferable when these are not an issue and marginal tax rates decline after retirement.Footnote 25 If savings in the traditional account eventually reach a level such that the marginal tax rate after retirement exceeds the pre-retirement one, a permanent switch to Roth accounts would be recommended. Temporary moves to the Roth side could also be motivated in periods of income loss or with particularly high tax deductions. After retirement, the strategy would be a generalization of our earlier example with α = 2/3 to every period: withdrawals would be made from the traditional account first, and α would be chosen such that higher marginal tax rates are avoided.
7 Retirement savings
The next question investigated is whether tax deductible contributions increase retirement savings. If so, is it an income or a substitution effect? In other words, savings may increase either because they are augmented by a tax subsidy or because people sacrifice more consumption before retirement. For this discussion, a measure of gross retirement savings can be obtained by computing the accumulated value of savings as follows:

In the traditional account case, gross retirement savings are inflated in the sense that taxes will have to be paid on withdrawals. Accordingly, we also define a net measure of retirement savings where a ‘tax liability’ is deducted. This tax liability measures the present value of taxes attributable to withdrawals (st) and it can be obtained by taking the difference between taxes with and without withdrawals as follows:

7.1 Increases in retirement savings
Table 3 presents the gross and net levels of retirement savings at age 65 for each scenario, assuming that the individual was aged 45 in 2010. By education group, these range from: $40,000–$100,000 (LHS), $60,000–$140,000 (HS), and $50,000–$350,000 (college). Not surprisingly, higher income translates into higher savings, whereas higher pensions reduce the need to accumulate wealth. The bottom part of Table 3 tests the sensitivity of these results to the following changes in parameters: a = 25, a = 45, r = 1%, r = 5%, β = 1, β = 5, γ = 1, γ = 5, a reduction of 25% in Social Security benefits, and an increase by $5,000 in the exemptions amounts E and ER. All cases are considered in the analysis, but due to space limitations the tables present only the averages for all education/income categories. The results show that the level of retirement savings is very sensitive to the choice of parameters, being halved or almost doubled in some scenarios.
Table 3. Retirement savings at age 65 (in dollars; age 45 in 2010)

The more salient result in Table 3 is the marginal impact of traditional accounts on retirement savings. First, those in the less-than-high-school category increase their savings by about $15,000. In their case, the gross and net differences are the same because they do not have enough income after retirement to pay taxes. The differences for the high-school group are much smaller: the gross increase is about $6,000 and the net increase averages only $200. Finally, the college group displays the more striking results. In their case, the gross increase can be pretty substantial reaching up to $68,000. However, once the increase in the tax liability is considered, this gain mostly vanishes with an average of only $2,800. Testing the sensitivity of these results, we find that the meaningful increase in net retirement savings for the less-than-high-school group is generally robust. The high-school group exhibits more variable results, ranging from −$20,000 to $20,000. The college group displays a pattern mostly similar to the one observed in the top portion of Table 3: a high increase in gross retirement savings is experienced, but most of it goes away once the net values are considered.
7.2 Income and substitution effects
Traditional accounts are more effective at increasing savings in some cases than others. To understand why, this section suggests a breakdown for the increase in retirement consumption (or equivalently the increase in net retirement savings) into an income effect and a substitution effect. The income effect comes from the additional tax subsidy that tax deductible contributions generate. It can be computed as the difference in the lifetime value of taxes as follows:

It should be noted that retirement consumption does not increase by the entire amount of the tax subsidy, the wealth effect increases consumption in all periods, both before and after retirement.Footnote 26 Introducing the notation $C_{t\comma T} \equals \int _{t}^{T} {\rm e}^{ \minus r\lpar R \minus s\rpar } c_{s}\ast {\rm d}s$, the proportion of the tax subsidy allocated to retirement consumption can be estimated with the following ratio:

A reduction in tax rates after retirement not only creates a subsidy but also a substitution effect by lowering the relative price of post-retirement consumption. The value of the new savings before retirement associated with the substitution effect can be computed with ${\rm New}\;{\rm Savings} \equals C_{t_{\setnum{0}} \comma \omega }^{{\rm Roth}} \lpar q^{{\rm Trad}} \minus q^{{\rm Roth}} \rpar $. Accordingly, the total changes in pre- and post-retirement consumptionFootnote 27 can be written as

Table 4 presents the results of the decomposition for the change in retirement consumption along with the tax subsidies and the applicable effective marginal tax rates. The table includes the previous cases from Figure 3 and we will use them as representative examples. The average tax subsidy in Table 4 is about $10,000; since q Trad is generally around 20%, the average increase in retirement consumption attributable to the tax subsidy is only $2,000. By themselves, subsidies can be a relatively expensive way to generate a small increase in retirement consumption. For example, the increase is at best $9,240 in Case 6, but the total cost of the subsidy is $48,396.
Table 4. Increase in retirement consumption

The substitution effect can generate larger increases in retirement consumption than the income effect, but only for some groups. It works well for those who pay little or no taxes after retirement, for example in Cases 1 and 2 the associated increases in retirement consumption are $13,745 and $14,248. For those who pay meaningful taxes after retirement, the taxation of Social Security benefits again changes the cards and reduces the substitution effect's potential. In many cases, marginal tax rates increase and the substitution effect is actually negative. For example, in Case 5 the marginal tax rate after retirement jumps to 27.75% and retirement consumption is reduced by $13,214.
To conclude this section, it is interesting to put the tax subsidies in perspective by observing that they move in sync with the welfare gains in Table 2. In other words, welfare gains associated with the tax deductibility of contributions arise because they are paid for by tax subsidies. Tax deductible contributions do not generate a benefit above their cost: on average, welfare gains are $500 less than tax subsidies because consumption patterns are disrupted. Although tax deductible contributions benefit most people in partial equilibrium, it should be kept in mind that this is not necessarily the case in general equilibrium where tax subsidies have to be financed by an increase in other taxes.
8 Conclusion
The results in this paper have a number of interesting applications for financial planners and 401(k)s. For financial planners who advise clients on their Roth/traditional decision, it is a good news, bad news story. The bad news is that the case for which we can determine with more conviction that traditional accounts are superior is for those who have lower incomes (i.e. those who are less likely to seek advice.) For those with higher incomes and some pensions, the results are often not as clear cut – the good news is that welfare losses from making the wrong decision are not excessively high. Actually, we show that a mixed traditional/Roth strategy can improve welfare when it helps to avoid the higher marginal tax rates due to the taxation of Social Security benefits. In contrast, we find that naïve diversification strategies offer limited risk reduction benefits when tax risk takes only the form of variability in future tax rates.
Our findings have also applications in the realm of 401(k)s: recently, employees were given the possibility of directing their 401(k) contributions to a Roth account, but employer matching contributions can only go to a traditional account. As some employees can lose with traditional accounts, our results suggest that extending the Roth opportunity to employer contributions would benefit these employees. Moreover, default strategies are becoming increasingly important in 401(k) plans. Although these have focused on contribution levels and asset allocation, it would also be interesting to consider which type of retirement account (traditional or Roth) should be set as a default option. This paper illustrated results for a wide range of cases and offers a starting point for this type of analysis. In particular, this paper hopes to raise awareness in terms of the underappreciated role played by the taxation of Social Security benefits.
This paper also offers new developments from a methodological perspective. By exploiting a dual approach, we illustrate how an analytical framework can be retained even when incorporating elements such as borrowing constraints, risks, and discontinuities. The analytical approach also provides valuable new insight by showing that the solution can be based on a system of known budget constraint equations. The benefit of this alternative formulation over more conventional dynamic programming is twofold: it produces exact values and eliminates the time-consuming process of estimating an unknown value function with numerical optimization and backward induction.
Finally, this paper found that the differences between Roth and traditional accounts are limited to some groups within the context of a standard life-cycle model analysis. The interesting question then is whether there are also practical considerations that affect the comparison. For example, the tax deductibility of contributions may be associated with behavioral effects that lead people to save more than they would with Roth accounts. That could be the case if the value of the immediate tax refund looms larger than the associated tax liability in the decision. Another intriguing issue is the fact that the tax deductibility of contribution can increase gross savings significantly. This is advantageous for an individual who is able to achieve superior investment returns. Similarly, having more assets under management is obviously beneficial for the industry. On the other hand, postponing tax receipts might not be desirable for cash-strapped governments. In addition, unsophisticated investors are prone to investment mistakes and this problem is leveraged with larger assets. These potential issues are left for future research.
Appendix A: Technical details for the solution of Section 2.3's optimization problem
A.1 Standard Lagrangian and dual approach
To prove that a solution to an optimization problem with constraints is optimal, the standard approach is to start by constructing a Lagrangian function where each constraint is multiplied by its Lagrange multiplier and the result appended to the objective function as follows:

In (A.1), Wt is the wealth process and μ and ηt denote, respectively, the Lagrange multipliers for the budget constraint and the borrowing constraint at time t. In state i after retirement, piμi and $p_{i} \eta _{t}^{i} $ are used instead. To reduce clutter, we omit the subscripts i in this Appendix unless necessary. With this notation defined, an optimal solution must satisfy the following four Karush–Kuhn–Tucker (KKT) necessary conditions for all t: (1)
$W_{\setnum{0}}\ast \equals 0$ and
$W_{t}\ast \geqslant 0$, (2)
$\eta _{t} \geqslant 0$, (3)
$\eta _{t} W_{t}\ast \equals 0$, and (4)
$L'\lpar c_{t}\ast \rpar \equals 0$. As mentioned in Section 3, the solution
$c_{t}\ast $ to
$L'\lpar c_{t}\ast \rpar \equals 0$ gives us little insight in terms of showing that
$W_{t}\ast \geqslant 0$ everywhere.
To overcome this problem, the dual approach suggested in Lachance (Reference Lachance2012) is used and adapted to handle tax risk and discontinuities. Following He and Pages (Reference He and Pages1993) and applying integration by parts,Footnote 28 the Lagrangian in (A.1) can be rewritten in terms of a process X(t) as follows:

where

With this form, the dual approach can be applied as a two-step process: (1) find $c_{t}\ast $ that unconditionally maximizes the Lagrangian and (2) substitute the result in L to find the process X(t) that minimizes L.Footnote 29 The advantage of this formulation is that it does not require that we show that
$W_{t}\ast \geqslant 0$ everywhere. The first step is easy because we solve an unconstrained problem instead of a constrained one; the solution for
$c_{t}\ast $ is given in Section 3.2 and is not repeated here. The next section explains how to solve for X(t) in the second step.
A.2 Solution for X(t)
Binding periods: In periods where the borrowing constraint is binding, the individual consumes as much as he can and $c_{t}\ast \equals \bar{y}_{t} \equals y_{t} \minus {\rm tax}_{t} \lpar 0\rpar $. Substituting this in equation (17) and replacing λ by X(t), we can invert the only possible solution X(t) = λ(t) where

Note that binding periods can only happen in periods when λ(t) is decreasing since X(t) must be decreasing in these periods.
Non-binding periods: Within a period where $W_{t}\ast \gt 0$, by construction X(t) must be constant and we denote this by X(t) = λ.Footnote 30 With tax risk, a different constant λi is used in each state i after retirement. If there is more than one non-binding period, a different (and decreasing) constant λ would be used in each separate period. For simplicity, we use a single period [T 1,T 2] below, but the same concepts would apply with multiple periods.
Connection points between periods: At the connection points T 1 and T 2, the solution is generally continuous and λ(T 1) = λ = λ(T 2).Footnote 31 With risk, this condition becomes:

Within intervals where λ(t) is strictly decreasing, this condition can be used to express T 1 and T 2 as inverse functions T 1(λ) and T 2(λ). For the special case where T 1=t 0, the condition λ(T 1) = λ becomes λ ⩾ λ(t 0).
Budget constraint and X(t) that minimizes L: For each state i = 1, …, N, the budget constraint can be written as the present value of savings over the interval $\lsqb T_{\setnum{1}} \comma T_{\setnum{2}}^{i} \rsqb $ with

Restricting the processes X(t) to those that satisfy the budget constraint, the second component in L is zero. L becomes:

and we can show that dL/dλ < 0. Since L decreases with λ, the criteria ‘X(t) that minimizes L’ reduces to choosing the process with the highest λ among those that satisfy the budget constraint. The next section gives a practical algorithm to solve for λ.
A.3 Algorithm to solve for λ
To develop a practical algorithm to solve for λ, the key is to formulate the problem as an equation g(λ) = 0 to which the bisection method can be applied. Recall that for an interval λ∊(λ1, λ2), the intermediate value theorem guarantees the existence of a unique solution to g(λ) = 0 if g′(λ) > 0 for all λ∊(λ1, λ2), g(λ1) < 0, and g(λ2) > 0. From the previous section, our problem is to find values λ and $\lambda _{i} $ such that the N budget constraint equations W(λ, λi) = 0 are satisfied, the condition
$\lambda \equals \sum _{i \equals \setnum{1}}^{N} p_{i} \lambda _{i} $ is met, λ ⩾ λ(t 0) if T 1=t 0, λ = λ(T 1) if T 1>t 0, λi = λ(T 2i), and λ′(t) < 0 for all t∊[t 0,T 1)∪[T 2i,ω)]. For a given λ, retirement savings can be expressed as a function WR(λ). When WR(λ) > 0, the budget constraint equation W(λ, λi) = 0 allows us to express λi as an inverse function λi(λ) with
$\lambda _{i} ^{\prime } \lpar \lambda \rpar \lt 0$.Footnote 32 If there is only one period
$\lsqb t_{\setnum{0}} \comma \tilde{t}\hskip2pt\rpar $ before retirement where λ′(t) < 0, we can define a value
$\bar{\lambda } \in \lpar \lambda \lpar \tilde{t}\rpar \comma \infty \rpar $ such that retirement wealth is positive for all
$\lambda \gt \bar{\lambda }$.Footnote 33 This also allows us to define an inverse function T 1(λ) with λ′(t) < 0 for all t∊[t 0,T 1).
Our problem can then be formulated as a single equation $g\lpar \lambda \rpar \equals \lambda \minus \sum _{i \equals \setnum{1}}^{N} p_{i} \lambda _{i} \lpar \lambda \rpar \equals 0$ with g′(λ) > 0. The range
$\lpar \lambda _{\setnum{1}} \comma \lambda _{\setnum{2}} \rpar \equals \lpar \bar{\lambda }\comma \infty \rpar $ of values of λ to consider can be divided into two sub-intervals
$\lpar \bar{\lambda }\comma \lambda \lpar t_{\setnum{0}} \rpar \rpar $ and [λ(t 0),∞) where respectively the case T 1>t 0 applies if g(λ(t 0)) > 0 and the case T 1=t 0 applies if g(λ(t 0)) ⩽ 0. For the case T 1>t 0, when g(λ(t 0)) > 0 there exists a unique solution
$\lambda \in \lpar \bar{\lambda }\comma \lambda \lpar t_{\setnum{0}} \rpar \rpar $ to g(λ) = 0 since we can show that
$g\lpar \bar{\lambda }\rpar \lt 0$. For the case T 1=t 0, when g(λ(t 0)) ⩽ 0 there exists a unique solution λ∊[λ(t 0),∞) to g(λ) = 0 since we can show that g(∞) > 0.Footnote 34
Appendix B: Closed-form equations for the budget constraint
This appendix suggests a set of realistic assumptions to express the budget constraint as a series of closed-form equations. As is customary in this literature, the power utility function u(c)=c 1 − γ/(1 − γ), γ≠1 is used. For the mortality and income assumptions, we opt for functional forms that are flexible enough to fit any discrete mortality table or income profile. For mortality, it is assumed that a constant force of mortality μj applies at each age j. For the pre-retirement income function, we assume that the income process is continuous and that it grows at a rate gj at age j. Let J(t)=j if j⩽t < j + 1, the survival probability function for J(t)>t 0 is given by $p_{t_{\setnum{0}} \comma t} \equals {\rm e}^{ \minus \sum _{{l \equals t_{\setnum{0}} }}^{{J\lpar t\rpar \minus \setnum{1}}} \mu _{l}\minus \mu _{{J\lpar t\rpar }} \lpar t \minus J\lpar t\rpar \rpar } $ and the pre-retirement income process by
$y_{t} \equals y_{J\lpar t\rpar } {\rm e}^{g_{j} \lpar t \minus J\lpar t\rpar \rpar } $.
The budget constraint W(λ, λi) = 0 in equation (A.5) is equal to the present value of savings over the interval $\lsqb T_{\setnum{1}} \comma T_{\setnum{2}}^{i} \rsqb $. From equations (13) and (4), savings are given by

Note that in equation (B.1), $c_{t}\ast \lpar \lambda \rpar $ can take different forms: it can be the interior solution given in equation (22) or it can be Ck or Ch given in equations (14) and (15). Thus, the budget constraint is the sum of the present value of savings over a series of sub-intervals [t,T] where
$c_{t}\ast \lpar \lambda \rpar $ takes the same form. For each interval [t,T], the present value of savings can be expressed in closed form if we can integrate the present values of the functions yt,
$c_{t}\ast \lpar \lambda \rpar $, Ht,h,
$B_{t\comma h}^{S} $, and some constants, which can easily be done as follows:





These results can be combined with equations (14), (15), and (B.1) to obtain the present value of savings over an interval [t,T]. By multiplying them by ${\rm e}^{r\lpar R \minus T_{\setnum{1}} \rpar } $, they can also be used to compute the retirement savings, tax liability, and tax subsidy measures in Section 7.
Appendix C: Value function
The welfare gain is computed by first determining the percentage increase in consumption over the interval [T 1,T 2] that would make the individual indifferent between the Roth and traditional cases. With the exception of the interval [T 1,T 2], this approach is equivalent to that presented in other works such as Cocco et al. (Reference Cocco, Gomes and Maenhout2005).Footnote 35 To obtain a dollar measure, the percentage increase is multiplied by the value of consumption $C_{T_{\setnum{1}} \comma T_{\setnum{2}} }^{{\rm Roth}} $. Let
$V_{T_{\setnum{1}} \comma T_{\setnum{2}} } \equals \int _{T_{\setnum{1}} }^{T_{\setnum{2}} } f\hskip 1\lpar t\rpar u\lpar c_{t}\ast \rpar {\rm d}t$ denote the value function, the welfare measure can be expressed as
