Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-02-06T16:17:06.483Z Has data issue: false hasContentIssue false

Discrete-time risk-aware optimal switching with non-adapted costs

Published online by Cambridge University Press:  06 June 2022

Randall Martyr*
Affiliation:
Kingston University London
John Moriarty*
Affiliation:
Queen Mary University of London
Magnus Perninge*
Affiliation:
Linnaeus University
*
*Postal address: River House, 5357 High Street, Surrey, UK. Email address: r.martyr@kingston.ac.uk
**Postal address: School of Mathematical Sciences, Queen Mary University of London, Mile End Road, London, UK. Email address: jmoriarty@qmul.ac.uk
***Postal address: PG Vejdes väg 7, 352 52 Växjö, Sweden.
Rights & Permissions [Opens in a new window]

Abstract

We solve non-Markovian optimal switching problems in discrete time on an infinite horizon, when the decision-maker is risk-aware and the filtration is general, and establish existence and uniqueness of solutions for the associated reflected backward stochastic difference equations. An example application to hydropower planning is provided.

Type
Original Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

1.1. Optimal switching problems

Optimal switching problems involve an agent controlling a system by successively switching an operational mode between a discrete set of choices. Time may be either continuous or discrete, and in all cases the latter is useful for numerical work (see for example [Reference Carmona and Ludkovski3]). In related contexts, risk sensitivity with respect to uncertain costs has been modelled using nonlinear expectations; see [Reference An, Cohen and Ji1] for example. This feature is particularly appropriate in data-driven settings where models themselves may be uncertain. Examples include when the probability model is derived from numerical weather predictions depending on unknown physical parameters, or, alternatively, in model-free reinforcement learning. In the latter context, recent work has applied a general analytic framework for risk sensitivity [Reference Kose and Ruszczynski10].

Taking a probabilistic approach, in this paper we consider a general filtration, which interacts with the nonlinear expectation. More precisely, let $ \mathbb{T}$ be a subset of $ \mathbb N_0 \,:\!=\, \{0,1,\ldots\}$ , and let $ \big\{\tilde g_{\xi_{t-1},\xi_t}(t)\big\}_{t \in \mathbb{T}}$ be a sequence of random costs dependent on a switching strategy $ \xi$ , i.e. a random sequence $ (\xi_t)_{t \in \{{-}1\}\cup\mathbb T}$ taking values in a finite set $ \mathcal I\,:\!=\, \{1,\ldots,m\}$ , representing the set of operating modes. We do not require that every cost be observable, which, for example, enables study of the interaction between delayed or missing observations and risk sensitivity. The time horizon is either infinite ( $ \mathbb T=\mathbb N_0$ ) or finite ( $ \mathbb T=\{0,1,\ldots,T\}$ for some finite $ T\geq 0$ ), and the value of the switching problem is defined under a nonlinear expectation (cf. Equations (2.1) and (3.2) below). Optimal stopping problems (see, for example, [Reference An, Cohen and Ji1]) are recovered in the special case of two modes (i.e. $ m=2$ ), when optimisation is performed over strategies $ \xi$ with a single jump.

1.2. Setup and related work

We have a probability space $ (\Omega,\mathcal{F},\mathbb{P})$ and a filtration $ \mathbb{G} = \{\mathcal{G}_{t}\}_{t \in \mathbb{T}}$ of sub- $ \sigma$ -algebras of $ \mathcal{F}$ . Given operating modes $ \mathcal{I}\,:\!=\,\{1,\ldots,m\}$ and essentially bounded random variables $ g = \{g_{i}(t) \colon i \in \mathcal{I}\}_{t \in \mathbb{T}}$ and $ c = \{c_{i,j}(t) \colon i,j \in \mathcal{I}\}_{t \in \mathbb{T}}$ on $ (\Omega,\mathcal{F},\mathbb{P})$ , we are interested in solving an optimal switching problem with running costs g and switching costs c when the information available to the decision-maker is given progressively according to $ \mathbb{G}$ , and where a dynamic measure of risk sensitivity is used which generalises the usual sequence $ \{\mathbb{E}[\cdot \vert \mathcal{G}_{t}]$ , $ t \in \mathbb{T}$ } of conditional expectations with respect to $ \mathbb{G}$ . For the following discussion we set

(1.1) \begin{align} \tilde{g}_{\xi_{t-1},\xi_t}(t) \,:\!=\, g_{\xi_t}(t) + c_{\xi_{t-1},\xi_t}(t).\end{align}

Note that we are considering a setting where each of the costs $ \big\{\tilde g_{i,j}(t) \colon i,j \in\mathcal I\big\}_{t\in\mathbb T}$ is measurable with respect to the $ \sigma$ -algebra $ \mathcal F$ and $ \mathbb G$ is any filtration with $ \mathcal G_t\subset \mathcal F$ for all $ t\in\mathbb T$ . We thus may have, but do not limit ourselves to, the situation where $ \mathbb G$ is the natural filtration generated by $ \{\tilde g_{i,j}(t) \colon i,j \in\mathcal I\}_{t\in\mathbb T}$ . To our knowledge, the necessary and sufficient conditions we provide for an optimal switching strategy in this infinite-horizon setting under general filtration are novel and extend results in, for example, [Reference An, Cohen and Ji1, Reference Cheridito, Delbaen and Kupper4, Reference Follmer and Schied8, Reference Krätschmer and Schoenmakers11, Reference Pichler and Shapiro15].

The rest of the paper is structured as follows. Section 2 presents our main results in the finite-horizon setting, and these are extended to infinite horizon in Section 3. In both cases, the solution to the optimal switching problem is used to establish the existence of solutions to the associated reflected backward stochastic difference equations, and we also prove uniqueness of the solution. We close the paper with two examples. Section 4 briefly confirms that the approach taken to missing or delayed observations is capable of changing both the value process and the optimal strategy. In Section 5 we apply neural networks to obtain numerical solutions to a non-Markovian hydropower planning problem with non-adapted costs and examine the risk sensitivity of the solutions.

2. Finite-horizon risk-aware optimal switching under general filtration

For the rest of the paper, we establish the following notation:

  • Let $ m\mathcal F$ denote the space of random variables on $ (\Omega,\mathcal{F},\mathbb{P})$ .

  • Let $ L^{\infty}_{\mathcal{F}}$ be the subspace of essentially bounded random variables on $ (\Omega,\mathcal{F},\mathbb{P})$ .

  • Let $ \mathbb{G} = \{\mathcal{G}_{t}\}_{t \in \mathbb{T}}$ be a filtration, with $ \mathcal{G} = \bigvee_{t \in \mathbb{T}}\mathcal{G}_{t}$ the $ \sigma$ -algebra generated by all $ \mathcal{G}_{t}$ and $ \mathcal G\subset\mathcal F$ .

  • Let $ T<\infty$ be a finite time horizon, and for $ 0 \le t \le T$ , let $ \mathscr{T}_{[t,T]}$ (resp. $ \mathscr{T}_{t}$ ) denote the set of $ \mathbb{G}$ -stopping times with values in $ t,\ldots,T$ (resp. $ t,t+1,\ldots$ ).

  • Let $ \rho$ be a $ \mathbb{G}$ -conditional risk mapping: a family of mappings $ \{\rho_{t}\}_{t \in \mathbb{T}}$ , $ \rho_{t} \colon L^{\infty}_{\mathcal{F}} \to L^{\infty}_{\mathcal{G}_{t}}$ , satisfying normalisation, conditional translation invariance, and monotonicity (see Appendix A.1).

  • For $ s, t \in \mathbb{T}$ with $ s \leq t$ , let $ \rho_{s,t}$ be the finite-horizon aggregated (or nested) risk mapping generated by $ \rho$ ([Reference Cheridito and Kupper5, Reference Pichler and Schlotter14, Reference Pichler and Shapiro15, Reference Shen, Stannat and Obermayer20, Reference Ugurlu21]; see also [Reference Bäuerle and JaŚkiewicz2]): that is, $ \rho_{t,t}(W_{t}) = \rho_{t}(W_{t})$ and

    \begin{equation*} \rho_{s,t}(W_{s},\ldots,W_{t}) = \rho_{s}\big(W_{s} + \rho_{s+1}\big(W_{s+1} + \cdots +\rho_{t-1}\big(W_{t-1} + \rho_{t}(W_{t})\big)\cdots \big)\big),\; s < t. \end{equation*}
  • All inequalities are interpreted in the $ \mathbb{P}$ -almost-sure sense.

For the finite-time-horizon setting of Section 2 we also set $ \mathbb{T} = \{0,1,\ldots,T\}$ .

The value process for the finite-horizon optimal switching problem is

(2.1) \begin{equation} V_t^{i} \,:\!=\, \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_t}\rho_{t,T}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}(T)\big),\end{equation}

where $ \tilde g_{i,j}(t) \,:\!=\, g_j(t)+c_{i,j}(t)$ , $ \mathcal U^i_t$ is the set of $\mathbb{G}$ -adapted strategies $ \xi$ with $ \xi_{t-1} = i$ , and the infimum of the empty set is taken to be $ \infty$ . Since for each t the costs $ c_{i,i}(t)$ depend only on i and may therefore be accounted for in the term $ g_{i}(t)$ , without loss of generality we may make the following assumption.

Assumption 2.1. For all $ i \in \mathcal{I}$ we have $ c_{i,i}(t) = 0$ for all $ t \in \mathbb{T}$ .

2.1. Dynamic programming equations

The use of aggregated risk mappings provides sufficient structure for dynamic programming. In our non-Markovian setting, appropriate equations are

(2.2) \begin{align}\begin{cases}\hat V_T^{i}=\min_{j\in\mathcal I} \rho_T\big(\tilde g_{i,j}(T)\big),& {}\\[6pt]\hat V_t^{i}=\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+\hat V^{j}_{t+1}\Big),& \text{for} \ 0\leq t<T.\end{cases}\end{align}

(The random fields $ \big\{\hat V_t^{i}: i\in\mathcal I, t\in \mathbb{T}\big\}$ coincide with Snell envelopes; see Remark 2.3.) We note by induction that $ \hat V_t^{i} \in L^\infty_{\mathcal G_t}$ for each $ i\in\mathcal I$ and $ t \in \mathbb{T}$ .

Remark 2.1. For comparison, in a Markovian framework with full observation and the linear expectation, randomness stems from an $ \mathbb{R}^{k}$ -valued Markov chain $ X^{s,x} \,:\!=\, \big\{X_t^{s,x}\big\}_{s \le t \le T}$ , where $ (s,x) \in \mathbb{T} \times \mathbb{R}^{k}$ is fixed and $ X_r^{s,x}=x$ for $ 0 \le r \le s$ almost surely under $ \mathbb{P}^{(s,x)}$ , and $ \mathbb{G}$ is the natural filtration of $ X^{s,x}$ . In the Markovian case, by virtue of each strategy $ \xi$ being adapted to $ \mathbb{G}$ , for every $ t \ge 0$ there exists a function $ \Xi_{t} \colon \big(\mathbb{R}^{k}\big)^{t+1} \to \mathcal{I}$ such that $ \xi_{t} = \Xi_{t}(X_{0},\ldots,X_{t})$ . The Bellman equation is then the appropriate formulation for dynamic programming: for any $ i \in \mathcal I$ and $ (s,x) \in \mathbb{T} \times \mathbb{R}^{k}$ ,

(2.3) \begin{equation} \begin{cases}v^{i}(T,x) = \min_{j\in\mathcal I} \tilde{g}_{i,j}(T,x), \\[5pt]v^{i}(s,x) = \min_{j\in\mathcal I } \Big(\tilde{g}_{i,j}(s,x)+\mathbb E^{(s,x)}\big[v^{j}\big(s+1,X_{s+1}\big)\big]\Big),\;\; s < T, \,\end{cases}\end{equation}

where the $ v^i$ and $ \tilde{g}_{i,j}$ are deterministic functions on $ \mathbb{T} \times \mathbb{R}^{k}$ .

Theorem 2.1. The random field $ \big\{\hat V_t^{i}\,:\, i\in\mathcal I, t\in \mathbb{T}\big\}$ consists of value processes for the optimal switching problem, in the sense that

\begin{equation*} \hat V_t^{i}=V_t^{i} \quad \forall\, i \in \mathcal I, \ t\in\mathbb T.\end{equation*}

Moreover, starting from any $ 0 \le t \le T$ and $ i \in \mathcal{I}$ , an optimal strategy $ \xi^{*} \in \mathcal U^{i}_t$ can be defined as follows:

(2.4) \begin{equation} \begin{cases} \xi_{t-1}^* = i, \\[4pt]\xi_s^* \in {\arg\min}_{j \in \mathcal I}\rho_s\Big(\tilde g_{\xi_{s-1}^*,j}(s)+\hat V^{j}_{s+1}\Big), \quad t \le s < T,\\[5pt]\xi_T^* \in {\arg\min}_{j \in \mathcal I}\rho_T\Big(\tilde g_{\xi_{T-1}^*,j}(T)\Big). \end{cases}\end{equation}

Proof. Note that the result holds trivially for $ t=T$ . We will apply a backward induction argument and assume that for $ s=t+1,t+2,\ldots,T$ and all $ i\in\mathcal I$ we have $ \hat V_s^{i}=V_s^{i}= \rho_{s,T}\left(\tilde g_{i,\xi_{s}}(s),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}(T)\right)$ , where $ \xi_{t-1}=i$ and

\begin{equation*} \xi_s\in {\arg\min}_{j\in\mathcal I}\rho_s\Big(\tilde g_{\xi_{s-1},j}(s)+ V^{j}_{s+1}\Big), \end{equation*}

with $ V^{j}_{T+1} \,:\!=\, 0$ for all $ j\in\mathcal I$ .

The induction hypothesis implies that

\begin{align*} \hat V_t^{i}&= \min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+ V^{j}_{t+1}\Big) \\ &=\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+\mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^j_{t+1}}\rho_{t+1,T}\Big(\tilde g_{j,\xi_{t+1}}(t+1),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}(T)\Big)\Big). \end{align*}

For any $ \xi^{\prime}\in \mathcal U^i_{t}$ we note that by monotonicity and conditional translation invariance we have

\begin{align*} \hat V_t^{i}&\leq\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+\rho_{t+1,T}\Big(\tilde g_{j,\xi^{\prime}_{t+1}}(t+1),\ldots,\tilde g_{\xi^{\prime}_{T-1},\xi^{\prime}_{T}}(T)\Big)\Big) \\ &\leq \sum_{j=1}^m \mathbf{1}_{\{\xi^{\prime}_t=j\}}\rho_t\Big(\tilde g_{i,j}(t)+\rho_{t+1,T}\Big(\tilde g_{j,\xi^{\prime}_{t+1}}(t+1),\ldots,\tilde g_{\xi^{\prime}_{T-1},\xi^{\prime}_{T}}(T)\Big)\Big) \\ &=\rho_t\left(\sum_{j=1}^m\mathbf{1}_{\{\xi^{\prime}_t=j\}}\Big\{\tilde g_{i,j}(t)+\rho_{t+1,T}\Big(\tilde g_{j,\xi^{\prime}_{t+1}}(t+1),\ldots,\tilde g_{\xi^{\prime}_{T-1},\xi^{\prime}_{T}}(T)\Big)\Big\}\right) \\ &=\rho_{t,T}\Big(\tilde g_{i,\xi^{\prime}_t}(t),\ldots,\tilde g_{\xi^{\prime}_{T-1},\xi^{\prime}_{T}}(T)\Big). \end{align*}

Taking the infimum over all $ \xi^{\prime}\in\mathcal U^i_{t}$ we conclude that $ \hat V_t^{i}\leq V_t^{i}$ . However, letting $ \xi^{\prime}_{t-1}=i$ and defining

\begin{equation*} \xi^{\prime}_s\in {\arg\min}_{j\in\mathcal I}\rho_t\Big(\tilde g_{\xi^{\prime}_{s-1},j}(s)+\hat V^{j}_{s+1}\Big) \end{equation*}

for $ s=t,\ldots,T$ , with $ \hat V^{j}_{T+1} \,:\!=\, 0$ for all $ j\in\mathcal I$ , we find that

\begin{align*} \qquad\qquad\qquad\hat V_t^{i}&= \rho_t\Big(\tilde g_{i,\xi^{\prime}_t}(t)+\mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\,\mathcal U_{t+1}}\rho_{t+1,T}\Big(\tilde g_{\xi^{\prime}_t,\xi_{t+1}}(t+1),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}(T)\Big)\Big) \\[2pt] &=\rho_{t,T}\Big(\tilde g_{i,\xi^{\prime}_t}(t),\ldots,\tilde g_{\xi^{\prime}_{T-1},\xi^{\prime}_{T}}\Big) \\[2pt] &\geq V^i_t.\\\end{align*}

2.2. Relation to systems of RBS $ \Delta$ Es

We now introduce a reflected backward stochastic difference equation (RBS $ \Delta$ E), which is a class of equations relevant to both optimal stopping and switching problems and is studied systematically in [Reference An, Cohen and Ji1] for finite-state processes. Let $ \mathcal L^{\infty}_{\mathbb G,T} \,:\!=\, \otimes_{t = 0}^{T}L^\infty_{\mathcal G_t}$ . To avoid excessive notation, some notation for scalar-valued processes will be reused for vector-valued ones, with the interpretation that all components are in the same space. Similarly, inequalities and martingale properties will be understood componentwise, and given $ i \in \mathcal{I}$ we write $ \mathcal I^{-i} \,:\!=\, \mathcal I \setminus\{i\}$ .

Definition 2.1 (Finite horizon RBS $ \Delta$ Es). With $ \mathbb T=\{0,\ldots,T\}$ , where $ 0\leq T<\infty$ , let $ Y = \{Y_{t}\}_{t\in\mathbb T}$ , $ M = \{M_{t}\}_{t\in\mathbb T}$ , and $ A = \{A_{t}\}_{t\in\mathbb T}$ be $ \mathbb{G}$ -adapted $ \mathbb R^m$ -valued processes satisfying

(2.5) \begin{equation} \begin{cases} Y^i_t = \min_{j\in\mathcal I}\rho_T\big(\tilde g_{i,j}(T)\big)+\sum_{s=t}^{T-1}\rho_s\Big(g_{i}(s)+\Delta M^i_{s+1}\Big) \\[5pt] \qquad-\big(M^i_T-M^i_t\big) -\big(A^i_{T}-A^i_t\big),\quad\forall \; t \in \mathbb{T},\\[5pt] Y^i_t \leq \min_{j\in\mathcal I^{-i}} \rho_t\Big(\tilde g_{i,j}(t)+Y^j_{t+1}\Big),\quad\forall \; t \in \mathbb{T}, \\[5pt] \sum_{t=0}^{T-1}\Big(Y^i_t - \min_{j\in\mathcal I^{-i}} \rho_t\Big(\tilde g_{i,j}(t)+Y^j_{t+1}\Big)\Big)\Delta A^i_{t+1}=0. \end{cases} \end{equation}

A triple $ (Y,M,A)\in \big(\mathcal L^{\infty}_{\mathbb G,T}\big)^3$ is said to be a solution to the system of RBS $ \Delta$ Es (2.5) if M is a $ \mathbb G$ -adapted $ \rho_{s,t}$ -martingale (applying the definition in Section A.3 of the appendix), A is non-decreasing and $ \mathbb G$ -predictable (with $ M_0=A_0=0$ ), and (Y, M, A) satisfies (2.5). A solution (Y, M, A) is called unique if any other solution (Y , M , A ) is indistinguishable as a process from (Y, M, A).

Remark 2.2. The martingale characterisation of the optimal switching value process (see for example [Reference Rieder16] under the linear expectation) may be derived from the associated RBS $ \Delta$ E. Under a risk mapping $ \rho$ , however, the ‘driver’ $ \rho_{t}\big(g_i(t) + \Delta M^i_{t+1}\big)$ in (2.5) depends on the $ \{\rho_{s,t}\}$ -martingale difference $ \Delta M^i_{t+1}$ , which is natural for general (infinite-state) backward stochastic difference equations—see [Reference Cohen and Elliott6]. Note also that the driver is a function of the mappings $ \omega \mapsto \Delta M^{i}_{t+1}(\omega)$ and $ \omega \mapsto g_{i}(\omega,t)$ and not the realised values of these random variables. Also, we refer to the last line in Equation (2.5) as the Skorokhod condition.

The optimal switching problem (2.1) is related to this system of RBS $ \Delta$ Es through the following result.

Theorem 2.2. The system of RBS $ \Delta$ Es (2.5) has a unique solution (Y, M, A). Furthermore, we have $ Y=V$ .

Proof. We divide the proof into two parts:

Existence: We aim to find a family of $ \rho_{s,t}$ -martingales $ M = \{M^{i}\}_{i \in \mathcal{I}}$ and non-decreasing $ \mathbb G$ -predictable processes $ A = \{A^{i}\}_{i \in \mathcal{I}}$ such that (V, M, A) solves (2.5). For every $ i \in \mathcal{I}$ , define the sequence $ \big\{A^i_t\big\}_{t=0}^{T}$ by

\begin{equation*} \begin{cases} A^i_0=0,\\[4pt] A^i_{t}= A^i_{t-1}+\rho_{t-1}\big(g_i(t-1)+V^i_t\big)-V^i_{t-1}, \quad t = 1,\ldots,T. \end{cases} \end{equation*}

We note that $ A^i$ is $ \mathbb G$ -predictable and non-decreasing since, by Theorem 2.1 and the backward induction formula (2.2), $ V_t^{i}\leq \rho_t\big(g_{i}(t)+V^{i}_{t+1}\big)$ . Furthermore, for $ t < T$ we have $ \Delta A^i_{t+1}=\rho_{t}\big(g_i(t)+V^i_{t+1}\big)-V^i_{t}=0$ on $ \big\{V^i_{t}=\rho_{t}\big(g_i(t)+V^i_{t+1}\big)\big\}\supset \big\{V^i_{t}<\min_{j\in\mathcal I^{-i}} \rho_t\big(\tilde g_{i,j}(t)+V^{j}_{t+1}\big)\big\}$ .

Let $ M^i$ be the martingale in the Doob decomposition (see Lemma A.5 in Appendix A.3) for $ V^i$ ; that is, $ M^i_0=0$ and $ \Delta M^i_{t+1}=V^i_{t+1}-\rho_{t}\big(V^i_{t+1}\big)$ . We have

\begin{align*} V^i_{t} &= \min_{j\in\mathcal I}\rho_T\big(\tilde g_{i,j}(T)\big)+\sum_{s=t}^{T-1} \big(V^i_{s}-V^i_{s+1}\big). \end{align*}

Now, as

\begin{align*} \Delta M^i_{s+1}+\Delta A^i_{s+1}&=V^i_{s+1}-\rho_{s}\big(V^i_{s+1}\big)+\rho_{s}\big(g_i(t)+V^i_{s+1}\big)-V^i_{s} \\[3pt] &=V^i_{s+1}+\rho_{s}\big(g_i(t)+V^i_{s+1}-\rho_{s}\big(V^i_{s+1}\big)\big)-V^i_{s} \\[3pt] &=V^i_{s+1}-V^i_{s}+\rho_{s}\big(g_i(t)+\Delta M^i_{s+1}\big), \end{align*}

we get $ V^i_{s}-V^i_{s+1}=\rho_{s}\big(g_i(s)+\Delta M^i_{s+1}\big)-\Delta M^i_{s+1}+\Delta A^i_{s+1}$ , and thus

\begin{align*} V^i_{t} = {} & \min_{j\in\mathcal I}\rho_T\big(\tilde g_{i,j}(T)\big)+\sum_{s=t}^{T-1}\rho_s\big(g_{i}(s)+\Delta M^i_{s+1}\big) -\big(M^i_T-M^i_t\big) \\ & -\big(A^i_T-A^i_t\big). \end{align*}

We conclude that (V, M, A) is a solution to the RBS $ \Delta$ E (2.5).

Uniqueness: Suppose that (Y, N, B) is another solution. Then

(2.6) \begin{equation} \Delta Y^i_{t+1} = -\rho_t\big(g_{i}(t)+\Delta N^i_{t+1}\big)+\Delta N^i_{t+1}+\Delta B^i_{t+1}. \end{equation}

Applying $ \rho_t$ on both sides gives

\begin{align*} \rho_t\big(\Delta Y^i_{t+1}\big) &=-\rho_t\big(g_{i}(t)+\Delta N^i_{t+1}\big)+\rho_t\big(\Delta N^i_{t+1}+\Delta B^i_{t+1}\big) \\ &=-\rho_{t}\big(g_{i}(t)+\Delta N^i_{t+1}\big)+\Delta B^i_{t+1}, \end{align*}

since, by our assumption on solutions to the RBS $ \Delta$ E, $ \Delta B^i_{t+1}$ is $ \mathcal G_t$ -measurable and $ N^i$ is a martingale. Inserted into Equation (2.6), this gives

\begin{align*} \Delta N^i_{t+1}&=\Delta Y^i_{t+1} + \rho_t\big(g_{i}(t)+\Delta N^i_{t+1}\big)-\Delta B^i_{t+1} \\ &=\Delta Y^i_{t+1}-\rho_t\big(\Delta Y^i_{t+1}\big) \\ &=Y^i_{t+1}-\rho_t\big(Y^i_{t+1}\big) \end{align*}

and

\begin{align*} \Delta B^i_{t+1}&=\rho_t\big(\Delta Y^i_{t+1}\big) + \rho_t\big(g_{i}(t)+\Delta N^i_{t+1}\big) \\ &=\rho_t\big(\Delta Y^i_{t+1}\big) + \rho_t\big(g_{i}(t)+Y^i_{t+1}-\rho_t\big(Y^i_{t+1}\big)\big) \\ &=\rho_t\big(g_{i}(t)+Y^i_{t+1}\big)-Y^i_t. \end{align*}

We conclude that

(2.7) \begin{equation} \begin{cases} \Delta N^{i}_{t+1}=Y^i_{t+1}-\rho_t\big(Y^i_{t+1}\big),\\[4pt] \Delta B^{i}_{t+1}=\rho_{t}\big(g_i(t)+Y^i_{t+1}\big)-Y^i_t, \end{cases} \end{equation}

and in particular we have that, given $ Y\in\mathcal L^{\infty}_{\mathbb G,T}$ , there is at most (up to indistinguishability of processes) one pair (N, B) such that (Y, N, B) solves the RBS $ \Delta$ E (2.5).

Since (Y, N, B) solves the RBS $ \Delta$ E (2.5) we have that

\begin{equation*}Y^i_t \leq \min_{j\in\mathcal I^{-i}} \rho_t\Big(\tilde g_{i,j}(t)+Y^j_{t+1}\Big),\end{equation*}

and

\begin{align*} Y^i_t &= Y^i_{t+1}+\rho_t\big(g_{i}(t)+\Delta N^i_{t+1}\big)-\big(N^i_{t+1}-N^i_t\big)-\big(B^i_{t+1}-B^i_t\big) \\ &\leq Y^i_{t+1}+\rho_t\big(g_{i}(t)+\Delta N^i_{t+1}\big)-\big(N^i_{t+1}-N^i_t\big) \\ &= Y^i_{t+1}+\rho_t\big(g_{i}(t)+Y^i_{t+1}-\rho_t\big(Y^i_{t+1}\big)\big)-\big(Y^i_{t+1}-\rho_t\big(Y^i_{t+1}\big)\big) \\ &=\rho_t\big(g_{i}(t)+Y^i_{t+1}\big). \end{align*}

We conclude that $ Y^i_t \leq \min_{j\in\mathcal I} \rho_t\big(\tilde g_{i,j}(t)+Y^j_{t+1}\big)$ for all $ t \le T$ and $ i \in \mathcal I$ . For $ t=T$ this implies that, for all $ i \in \mathcal I$ , $ Y^i_T \leq \min_{j\in\mathcal I} \rho_T\big(\tilde g_{i,j}(t)\big)=V^i_T$ . Assume that $ t < T$ and $ Y^i_{t+1} \leq V^i_{t+1}$ for all $ i \in \mathcal I$ ; then

\begin{align*} Y^i_t &\leq \min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+Y^j_{t+1}\Big) \\ &\leq \min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+V^j_{t+1}\Big) \\ &\leq V^i_t. \end{align*}

Applying an induction argument, we thus find that if (Y, N, B) solves the RBS $ \Delta$ E (2.5), then $ Y^i_t\leq V^i_t$ for all $ t \le T$ and $ i \in \mathcal I$ . To arrive at uniqueness we show that the value $ Y^i_t$ is attained by a strategy in which case the reverse inequality follows by optimality of $ V^i_t$ .

Define the stopping time $ \bar \tau^{t,i}_1 \,:\!=\, \inf \big\{ s\geq t:\Delta B^i_{s+1}>0\big\} \wedge T$ and the $ \mathcal G_{\bar \tau^{t,i}_1}$ -measurable $ \mathcal{I}$ -valued random variable $ \bar \beta^{t,i}_1$ as a measurable selection of

\begin{equation*} \begin{cases} \mathop{\arg\min}\limits_{j\in\mathcal I^{-i}}\rho_{\bar \tau^{t,i}_1}\Bigg(\tilde g_{i,j}\Big(\bar \tau^{t,i}_1\Big)+Y^{j}_{\bar \tau^{t,i}_1+1}\Bigg), & \bar \tau^{t,i}_1 < T, \\[7pt] \mathop{\arg\min}\limits_{j\in\mathcal I}\rho_{T}\big(\tilde g_{i,j}(T)\big), & \bar \tau^{t,i}_1 = T. \end{cases} \end{equation*}

Now, as $ B^i_{\bar \tau^{t,i}_1}-B^i_t=0$ , we have for $ t\leq s< \bar \tau^{t,i}_1$ the recursion

\begin{align*} Y^i_s &= Y^i_{s+1}+\rho_s\big(g_{i}(s)+\Delta N^i_{s+1}\big)-\big(\Delta N^i_{s+1}\big) - \big(\Delta B^i_{s+1}\big) \\ &=Y^i_{s+1}+\rho_s\big(g_{i}(s)+Y^i_{s+1}-\rho_s\big(Y^i_{s+1}\big)\big)-\big(Y^i_{s+1}-\rho_s\big(Y^i_{s+1}\big)\big) \\ &=\rho_s\big(g_{i}(s)+Y^i_{s+1}\big). \end{align*}

Furthermore, by the Skorokhod condition, on $ \big\{\bar \tau^{t,i}_1 < T\big\}$ we have that

\begin{align*} Y^i_{\bar \tau^{t,i}_1}&=\min_{j\in\mathcal I^{-i}}\rho_{\bar \tau^{t,i}_1}\bigg(\tilde g_{i,j}\big(\bar \tau^{t,i}_1\big)+Y^{j}_{\bar \tau^{t,i}_1+1}\bigg) \\ &=\rho_{\bar \tau^{t,i}_1}\bigg(\tilde g_{i,\bar \beta^{t,i}_1}\big(\bar \tau^{t,i}_1\big)+Y^{\bar \beta^{t,i}_1}_{\bar \tau^{t,i}_1+1}\bigg), \end{align*}

and since $ Y^{i}_{T} = \mathop{\arg\min}\limits_{j\in\mathcal I}\rho_{T}\big(\tilde g_{i,j}(T)\big)$ , we conclude that

\begin{align*} Y^i_t=\rho_{t,\tau^{t,i}_1}\bigg(g_i(t),\ldots,g_i\Big(\bar \tau^{t,i}_1-1\Big),\tilde g_{i,\bar \beta^{t,i}_1}\Big(\bar \tau^{t,i}_1\Big)+Y^{\bar \beta^{t,i}_1}_{\bar \tau^{t,i}_1+1}\bigg), \end{align*}

with $ Y^{j}_{T+1} = 0$ for all $ j \in \mathcal{I}$ .

This process can be repeated to define

\begin{equation*}\bar \tau^{t,i}_{k+1} \,:\!=\, \inf \Big\{ s > \bar \tau^{t,i}_{k} \colon \Delta B^{\beta^{t,i}_k}_{s+1}>0\Big\} \wedge T\end{equation*}

and the $ \mathcal G_{\bar \tau^{t,i}_{k+1}}$ -measurable $ \mathcal{I}$ -valued random variable $ \bar \beta^{t,i}_{k+1}$ as a measurable selection of

\begin{equation*} \begin{cases} \mathop{\arg\min}\limits_{j\in\mathcal I^{-\beta^{t,i}_k}}\rho_{\bar \tau^{t,i}_{k+1}}\bigg(\tilde g_{\beta^{t,i}_k,j}\big(\bar \tau^{t,i}_{k+1}\big)+Y^{j}_{\bar \tau^{t,i}_{k+1}+1}\bigg), & \tau^{t,i}_{k+1} < T,\\[9pt] \mathop{\arg\min}\limits_{j\in\mathcal I}\rho_{T}\Big(\tilde g_{\beta^{t,i}_k,j}(T)\Big), & \tau^{t,i}_{k+1} = T. \end{cases} \end{equation*}

Letting $ \mathcal{N} \,:\!=\, \min\Big\{k \ge 1 \colon \bar \tau^{t,i}_{k} \ge T\Big\}$ ,

\begin{equation*} \bar\xi^{t,i}_s \,:\!=\, i\mathbf{1}_{\big[-1, \tau^{t,i}_{1}\big)}(s) + \sum_{j=1}^{\mathcal{N}} \beta_j \mathbf{1}_{\big[\bar \tau^{t,i}_{j},\bar \tau^{t,i}_{j+1}\big)}(s) + \beta_{\mathcal{N}}\mathbf{1}_{\{s=T\}}, \end{equation*}

and arguing as above, we get that

\begin{align*} \qquad\quad\qquad\qquad\qquad\qquad Y^i_t&=\rho_{t,T}\bigg(\tilde g_{i,\bar\xi^{t,i}_t}(t),\ldots,\tilde g_{\bar\xi^{t,i}_{T-1},\bar\xi^{t,i}_{T}}(T)\bigg) \\ &\geq V^i_t.\end{align*}

Given a strategy $ \xi \in\mathcal U^{i}_t$ , we can define its pairs of jump times $ \tau_{j} \ge t$ and positions $ \beta_{j} \in \mathcal{I}$ as follows:

(2.8) \begin{equation} \begin{split} \tau_{1} & = \inf \big\{s \ge t \colon \xi_{s} \neq i\big\} \wedge T, \\ \beta_{1} & = \xi_{\tau_{1}}, \\ \vdots \\ \tau_{j+1} & = \inf \big\{s > \tau_{j} \colon \xi_{s} \neq \beta_{j}\big\} \wedge T, \\ \beta_{j+1} & = \xi_{\tau_{j+1}}. \end{split}\end{equation}

(Note that constant strategies $ \xi_t \equiv i$ satisfy $ \tau_{j} = T$ and $ \beta_j = i$ for all j.)

We have the following characterisation of an optimal strategy.

Corollary 2.1. A strategy $ \xi \in\mathcal U^{i}_t$ is optimal for (2.1) if

(2.9) \begin{equation} \begin{cases} A^{\beta_{j-1}}_{\tau_j}-A^{\beta_{j-1}}_{\tau_{j-1}}=0,\\[5pt] Y^{\beta_{j-1}}_{\tau_j}=\rho_{\tau_j}\Big(\tilde g_{\beta_{j-1},\beta_{j}}+Y^{\beta_j}_{\tau_j+1}\Big), \end{cases} \end{equation}

where $ \big\{\big(\tau_j,\beta_{j}\big)\big\}$ are the pairs of jump times and positions of $ \xi$ . If $ \rho$ has the strong sensitivity property (cf. Appendix A.1), then the condition (2.9) is also necessary for optimality.

Proof. Sufficiency: From the proof of Theorem 2.2 we have that

(2.10) \begin{equation} Y^i_t = \rho_{t,T}\big(\tilde g_{i,\xi_t}(t),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}\big), \end{equation}

and optimality follows by the fact that $ Y^i_t=V^i_t$ .

Necessity: Suppose $ \xi \in\mathcal U^{i}_t$ is optimal for (2.1) and $ \rho$ is strongly sensitive. Let $ \big\{\big(\tau_j,\beta_{j}\big)\big\}$ be the pairs of jump times and positions of $ \xi$ as defined in (2.8). Then, using (2.10) above, Lemma A.6, the RBS $ \Delta$ Es (2.5), and monotonicity of $ \rho$ , we have

\begin{align*} Y^i_t & = \rho_{t,T}\big(\tilde g_{i,\xi_t}(t),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}\big) \\ & = \rho_{t,\tau_{1}}\Big(g_{i}(t),\ldots,g_{i}(\tau_{1}-1),\rho_{\tau_{1},T}\big(\tilde g_{i,\beta_{1}}(\tau_{1}),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}\big)\Big) \\ & \ge \rho_{t,\tau_{1}}\Big(g_{i}(t),\ldots,g_{i}(\tau_{1}-1),\rho_{\tau_{1}}\Big(\tilde g_{i,\beta_{1}}(\tau_{1})+Y^{\beta_{1}}_{\tau_{1}+1}\Big)\Big)\\ & \ge \rho_{t,\tau_{1}}\Big(g_{i}(t),\ldots,g_{i}(\tau_{1}-1),Y^{i}_{\tau_{1}}\Big)\\ & \vdots \\ & \ge Y^{i}_{t}, \end{align*}

where we set $ Y^{j}_{T+1} \,:\!=\, 0$ for all $ j \in \mathcal{I}$ . We therefore have

\begin{equation*} \rho_{t,\tau_{1}}\Big(g_{i}(t),\ldots,g_{i}(\tau_{1}-1),\rho_{\tau_{1}}\Big(\tilde g_{i,\beta_{1}}(\tau_{1})+Y^{\beta_{1}}_{\tau_{1}+1}\Big)\Big) = \rho_{t,\tau_{1}}\Big(g_{i}(t),\ldots,g_{i}(\tau_{1}-1),Y^{i}_{\tau_{1}}\Big), \end{equation*}

and by strong sensitivity of $ \rho$ and the definition of $ A^{i}$ from Theorem 2.2, (2.9) is true for $ j = 1$ . The general case is proved by induction in a similar manner.

2.3. The special case of optimal stopping

We now consider the problem of finding

(2.11) \begin{align} F_t\,:\!=\, \mathop{\text{ess} \, \text{inf}}\limits_{\tau\in\mathscr{T}_{[t,T]}}\rho_{t,\tau}\big(f(t),\ldots,f(\tau-1),h(\tau)\big), \end{align}

for given sequences $ \{f(t)\}_{t=0}^{T}$ and $ \{h(t)\}_{t=0}^{T}$ in $ \Big(L^{\infty}_{\mathcal{F}}\Big)^{T+1}$ . This problem can be related to optimal switching with two modes $ \mathcal I\,:\!=\,\{1,2\}$ . The optimal stopping problem (2.11) is equivalent to (2.1) if we do the following:

  • Set $ g_1(t)=f(t)$ for $ 0\leq t\leq T-1$ , $ g_1(T)=h(T)$ , $ c_{1,2}\equiv h$ , and $ g_2\equiv c_{2,1}\equiv 0$ .

  • Mutatis mutandis let $ \mathcal I$ depend on the present mode. We then set $ \mathcal I(1) \,:\!=\, \{1,2\}$ when we are in mode 1 and $ \mathcal I(2) \,:\!=\, \{2\}$ when we are in mode 2. In particular this gives $ \mathcal I(2)^{-2}=\emptyset$ in (2.5). We additionally use the conventions $ \min\emptyset=\infty$ and $ -\infty \cdot 0 = \infty \cdot 0 = 0$ .

  • Optimise over strategies satisfying $ \xi_{t-1}=1$ .

We note that in this setting the recursion (2.2) gives $ V^2\equiv 0$ . The following result is then a direct consequence of Theorem 2.2:

Theorem 2.3. The value process F for the optimal stopping problem satisfies

(2.12) \begin{align} \begin{cases} F_T=\rho_T(h(T)),&{}\\[4pt] F_t=\rho_t(f(t)+F_{t+1})\wedge\rho_t(h(t)),& \: 0\le t < T, \end{cases} \end{align}

and the stopping time $ \tau_{t} \in \mathscr{T}_{[t,T]}$ defined by

(2.13) \begin{equation} \tau_{t} = \inf\left\{ t \le s \le T \colon F_{s} = \rho_{s}(h(s))\right\} \end{equation}

is optimal for (2.11). Furthermore, there exist a $ \rho_{s,t}$ -martingale M and a non-decreasing $ \mathbb G$ -predictable process A such that (F, M, A) is the unique solution to the following RBS $ \Delta$ E:

(2.14) \begin{equation} \begin{cases} F_t = \rho_T(h(T))+\sum_{s=t}^{T-1}\rho_s(f(s)+\Delta M_{s+1})-(M_T-M_t) \\[2pt] \qquad-(A_{T}-A_t),\quad \forall\,t\in\mathbb T,\\[2pt] F_t \leq \rho_t(h(t)), \quad \forall\,t\in\mathbb T,\\[5pt] \sum_{t=0}^{T-1}(F_t - \rho_t(h(t)))\Delta A_{t+1}=0. \end{cases} \end{equation}

Remark 2.3. As is done in [Reference Follmer and Schied8], Theorem 2.3 can be used to identify the optimal switching problem with a family of optimal stopping problems. If we set

\begin{equation*}h^i(t) = \begin{cases}\min_{j\in\mathcal I} \rho_T\big(\tilde g_{i,j}(T)\big),& t = T,\\[3pt]\min_{j\in\mathcal I^{-i}} \rho_t\Big(\tilde g_{i,j}(t)+\hat V^{j}_{t+1}\Big),& t < T,\end{cases}\end{equation*}

and then substitute $ h^i$ for h in (2.12) and recall (2.2), it follows by Theorem 3 that for each $ i \in \mathcal I$ and $ t \in \mathbb{T}$ we have

\begin{equation*}\hat V_t^{i} = \mathop{\text{ess} \, \text{inf}}\limits_{\tau\in\mathscr{T}_{[t,T]}}\rho_{t,\tau}\big(\tilde g_{i,i}(t),\ldots,\tilde g_{i,i}(\tau-1),h^i(\tau)\big).\end{equation*}

3. Infinite-horizon risk-aware optimal switching under general filtration

In many problems the horizon T is so long that it can be considered infinite, and this motivates us to extend the results obtained in Section 2 to the infinite horizon. We thus let $ \mathbb T \,:\!=\, \mathbb N_0$ and define the infinite-horizon aggregated risk mapping $ \varrho_{s} \colon \Big(L^{\infty}_{\mathcal{F}}\Big)^\mathbb{T} \to m\mathcal G_{s}$ (with $ m\mathcal G_{s}$ the set of $ \mathcal G_{s}$ -measurable random variables) by

(3.1) \begin{align} \varrho_s\big(W_s,W_{s+1},\ldots\big) = \limsup_{t \to \infty} \rho_{s,t}\big(W_s,W_{s+1},\ldots,W_t\big). \end{align}

We define the value process for the switching problem on an infinite horizon as

(3.2) \begin{equation} V_t^{i} \,:\!=\, \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_t}\varrho_{t}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\tilde g_{\xi_{t},\xi_{t+1}}(t+1),\ldots\big). \end{equation}

Definition 3.1. Let $ \mathcal L_\mathbb{G}^{\infty} \,:\!=\, \otimes_{t \in \mathbb{T}}L^\infty_{\mathcal G_t}$ and

\begin{equation*} \mathcal{L}_\mathbb{G}^{\infty,d} \,:\!=\, \Big\{W \in \mathcal L_\mathbb{G}^{\infty} \colon \lim_{s \to \infty} \mathop{\text{ess} \, \text{sup}}\limits_\omega |W_s(\omega)| = 0 \Big\}. \end{equation*}

Also, let $ \mathcal{K}^{+}_{d}$ denote the set of all non-negative deterministic sequences $ \{k_{t}\}_{t \in \mathbb{T}}$ such that the series $ \sum_{t \in \mathbb{T}}k_{t}$ converges, and define

\begin{equation*} H_\mathcal{F} \,:\!=\, {} \Big\{ W \in \big(L_\mathcal{F}^\infty\big)^{\mathbb{T}} \colon \exists \{k_{t}\}_{t \in \mathbb{T}} \in \mathcal{K}^{+}_{d} \; \text{such that} \; |W_{t}| \le k_{t} \; \forall t \in \mathbb{T}\Big\}. \end{equation*}

Remark 3.1. If $ W \in H_\mathcal{F}$ then for every $ s \in \mathbb{T}$ the limit

\begin{equation*} \varrho_{s}\big(W_{s},W_{s+1},\ldots\big) = \lim_{t \to \infty}\rho_{s,t}\big(W_{s},\ldots,W_{t}\big) \end{equation*}

exists almost surely and belongs to $ L^{\infty}_{\mathcal{G}_{s}}$ (see Lemma A.2 in the appendix). An example $ W \in H_\mathcal{F}$ is a discounted sequence $ W_{t} = \alpha^t Z_t$ for some $ \alpha \in (0,1)$ and $ \{Z_{t}\}_{t \in \mathbb{T}} \subset L^\infty_\mathcal{F}$ with $ \sup_{t}|Z_{t}| < C$ for some $ C \in (0,\infty)$ .

Assumption 3.1. There exists a sequence $ \{\bar g(t)\}_{t\in\mathbb T}\in H_{\mathcal F}$ such that $ |\tilde g_{i,j}(t)| \leq \bar g(t)$ for all $ (t,i,j)\in\mathbb T\times\mathcal I^2$ .

3.1. Dynamic programming equations

For $ (t,r) \in\mathbb T^{2}$ we set $ \hat V_{t,r}^{i} \,:\!=\, \varrho_{t}\big(g_i(t),g_i(t+1),\ldots\big)$ whenever $ t>r$ and define

(3.3) \begin{align} \hat V_{t,r}^{i}=\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+\hat V^{j}_{t+1,r}\Big) \end{align}

recursively for $ t\leq r$ . By a simple induction argument we note that for each $ i\in\mathcal I$ and $ r\in\mathbb T$ the sequence $ \big\{\hat V_{t,r}^{i} \big\}_{t\in\mathbb T}$ exists as a member of $ \mathcal L^{\infty,d}_\mathbb G$ . We have the following lemma.

Lemma 3.1. For $ 0\leq t\leq r$ and $ i\in\mathcal I$ , let $ \mathcal U^i_{t,r}\,:\!=\,\big\{\xi\in\mathcal U^i_t\,:\,\xi_s=\xi_r,\,\forall s> r\big\}$ . Then

\begin{align*} \hat V_{t,r}^{i}= \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,r}}\varrho_{t}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\tilde g_{\xi_{t},\xi_{t+1}}(t+1),\ldots\big). \end{align*}

Proof. This follows immediately from Lemma A.3 by applying Theorem 2.1 with cost sequence

(3.4) \begin{align} \Big(\tilde g_{i,j}(t),\tilde g_{i,j}(t+1),\ldots,\tilde g_{i,j}(r-1), \tilde g_{i,j}(r)+\varrho_{r+1}\big(g_j(r+1),g_j(r+2),\ldots\big)\Big)_{i,j \in\mathcal I}, \end{align}

noting that $ \varrho_{r+1}\big(g_j(r+1),g_j(r+2),\ldots\big)\in L^\infty_{\mathcal G_{r+1}}$ .

We arrive at the following verification theorem.

Theorem 3.1. The pointwise limits $ \big\{\tilde V_{t}^{i}\big\}_{t\in\mathbb T,i\in\mathcal I}\,:\!=\,\lim_{r\to\infty}\big\{\hat V_{t,r}^{i}\big\}_{t\in\mathbb T,i\in\mathcal I}$ exist and satisfy

\begin{align*} \tilde V_{t}^{i}=V_t^{i},\quad\forall\,t\in\mathbb T. \end{align*}

Furthermore, starting from any $ t \in \mathbb{T}$ and $ i \in \mathcal{I}$ , the limit family defines an optimal strategy $ \xi^* \in \mathcal U^i_t$ as follows:

\begin{equation*} \begin{cases} \xi_r^* = i, \quad r < t,\\[4pt] \xi_r^*\in {\arg\min}_{j \in\mathcal I}\rho_r\Big(\tilde g_{\xi_{r-1}^*,j}(r)+\tilde V^{j}_{r+1}\Big), \quad r \ge t. \end{cases} \end{equation*}

Proof. From Lemma 3.1 and as $ \mathcal U^i_{t,r}\subset \mathcal U^i_{t,r+1}\subset \mathcal U^i_{t}$ for all $ 0 \le t \le r$ , the sequence $ \big\{\hat V_{t,r}^{i}\big\}_{r\geq 0}$ is non-increasing and $ \hat V_{t,r}^{i}\geq V_t^{i}$ for all $ r\geq 0$ . Furthermore, as it is bounded from below by the sequence $ \big\{\varrho_t\big({-}\bar g(t),-\bar g(t+1),\ldots\big)\big\}_{t\in\mathbb T}$ (owing to monotonicity) and $ \big\{{-}\bar g(t)\big\}_{t \in \mathbb T} \in H_\mathcal F$ , we conclude that the sequence $ \Big\{\big\{\hat V_{t,r}^{i}\big\}_{t\in\mathbb T,i\in\mathcal I}\Big\}_{r\geq 0}$ converges pointwise.

Now, by Assumption 3.1 there is a non-negative decreasing deterministic sequence $ \{K_s\}_{s\in \mathbb T}$ , with $ \lim_{s \to \infty}K_s = 0$ , such that, for all $ \xi\in\mathcal U^i_{t,r}$ ,

(3.5) \begin{align}\nonumber &\big|\varrho_{r+1}\big(\tilde g_{\xi_{r},\xi_{r+1}}(r+1),\tilde g_{\xi_{r+1},\xi_{r+2}}(r+2),\ldots\big)\big| \\ & = \sum_{j\in\mathcal I}\mathbf{1}_{\{\xi_r=j\}} \big|\varrho_{r+1}\big( g_{j}(r+1),g_{j}(r+2),\ldots\big)\big|\leq K_{r+1}, \end{align}

and

(3.6) \begin{align} |\varrho_{r+1}\big({-}\bar g(r+1),-\bar g(r+2),\ldots\big)\big|& \leq K_{r+1}. \end{align}

Then, by Lemma A.3, (3.5) gives

\begin{align*} \hat V_{t,r}^{i} &= \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,r}}\varrho_{t}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\tilde g_{\xi_{t},\xi_{t+1}}(t+1),\ldots\big) \\ &\leq \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,r}} \rho_{t,r+1}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r),K_{r+1}\big) \\ &= \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,r}} \rho_{t,r}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r)\big)+K_{r+1}, \end{align*}

and (3.6) implies that

\begin{align*} V_t^{i}&= \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_t}\varrho_{t}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\tilde g_{\xi_{t},\xi_{t+1}}(t+1),\ldots\big) \\ &\geq \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t}}\rho_{t,r+1}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r),-K_{r+1}\big) \\ &= \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,r}}\rho_{t,r}\big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r)\big)-K_{r+1}. \end{align*}

We conclude that $ \hat V_{t,r}^{i}-V_t^{i}\leq 2K_{r+1}$ . Letting $ r\to\infty$ gives the first statement.

For the second part, first note that the following inequality holds:

(3.7) \begin{equation} V_t^{i} \ge \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,r}}\rho_{t,r}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r) + V^{\xi_{r}}_{r+1}\Big), \quad 0 \le t \le r. \end{equation}

Indeed, for every $ 0 \le t \le r$ and $ \xi \in \mathcal U^i_{t}$ we can use Lemma A.3 to argue that

\begin{align*} \varrho_{t}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\tilde g_{\xi_{t},\xi_{t+1}}(t+1),\ldots\Big) & = \rho_{t,r}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r) + \hat{V}_{r+1,r}^{\xi_{r}}\Big) \\ & \ge \rho_{t,r}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r) + V^{\xi_{r}}_{r+1}\Big) \\ & \ge \mathop{\text{ess} \, \text{inf}}\limits_{\xi^{\prime}\in\mathcal U^i_{t,r}}\rho_{t,r}\Big(\tilde g_{\xi^{\prime}_{t-1},\xi^{\prime}_t}(t),\ldots,\tilde g_{\xi^{\prime}_{r-1},\xi^{\prime}_{r}}(r) + V^{\xi^{\prime}_{r}}_{r+1}\Big), \end{align*}

and since this is true for every $ \xi \in \mathcal U^i_{t}$ we get (3.7). Next, momentarily fix $ 0 \le t \le r$ and replace $ g_{j}(r)$ with $ g_{j}(r) + V^{j}_{r+1}$ . Then, using Theorem 2.1 with $ T = r$ , we have

\begin{align*} V^i_t & \geq \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,r}}\rho_{t,r}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r) + V^{\xi_{r}}_{r+1}\Big) \\ & = \rho_{t,r}\Big(\tilde g_{i,\xi_{t}^*}(t),\tilde g_{\xi_{t}^*,\xi_{t+1}^*}(t+1),\ldots,\tilde g_{\xi_{r-1}^*,\xi_{r}^*}(r) + V^{\xi_{r}^*}_{r+1}\Big) \\ & \ge \rho_{t,r}\Big(\tilde g_{i,\xi_{t}^*}(t),\tilde g_{\xi_{t}^*,\xi_{t+1}^*}(t+1),\ldots,\tilde g_{\xi_{r-1}^*,\xi_{r}^*}(r)\Big) - K_{r+1}. \end{align*}

Letting $ r \to\infty $ we conclude that

\begin{align*} V^i_t & \geq\varrho_{t}\big(\tilde g_{i,\xi_{t}^*}(t),\tilde g_{\xi_{t}^*,\xi_{t+1}^*}(t+1),\ldots\big), \end{align*}

from which it follows that $ \xi^*$ is an optimal strategy.

We also record the following corollary, which will be used in the proof of Theorem 3.2.

Corollary 3.1. The value process for the infinite-horizon optimal switching problem (3.2) satisfies the following dynamic programming principle:

\begin{equation*} V_t^{i} = \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,r}}\rho_{t,r}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{r-1},\xi_{r}}(r) + V^{\xi_{r}}_{r+1}\Big), \quad 0 \le t \le r. \end{equation*}

Proof. We only need to prove that the following recursion holds:

(3.8) \begin{align}V^i_t=\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+V^{j}_{t+1}\Big).\end{align}

The general result then follows from Theorem 2.1 with $ T = r$ and replacing $ g_{j}(r)$ with $ g_{j}(r) + V^{j}_{r+1}$ . Taking limits on both sides in (3.3) gives

\begin{align*}V^i_t&=\lim_{r\to\infty}\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+\hat V^{j}_{t+1,r}\Big)\\&\leq \lim_{r\to\infty}\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+ V^{j}_{t+1}+2K_{r+1}\Big)\\&=\lim_{r\to\infty}\bigg\{\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+ V^{j}_{t+1}\Big)+2K_{r+1}\bigg\}\\&=\min_{j\in\mathcal I} \rho_t\Big(\tilde g_{i,j}(t)+ V^{j}_{t+1}\Big).\end{align*}

Since the reverse inequality follows as special case of (3.7), the proof is complete.

3.2. Relation to systems of RBS $ \Delta$ Es

Definition 3.2 (Infinite-horizon RBS $ \Delta$ Es). The infinite-horizon extension of Definition 2.1 (with $ \mathbb T=\mathbb N_0$ ) is given by

(3.9) \begin{equation} \begin{cases} Y^i_t = Y^{i}_{T} + \sum_{s=t}^{T-1}\rho_s\big(g_{i}(s)+\Delta M^i_{s+1}\big)-\big(M^i_T-M^i_t\big) \\[3pt] \qquad-\big(A^i_{T}-A^i_t\big),\quad \forall \; t,T \in \mathbb{T}\ \text{with}\ t \le T,\\[3pt] Y^i_t \leq \min_{j\in\mathcal I^{-i}} \rho_t\Big(\tilde{g}_{i,j}(t)+Y^j_{t+1}\Big), \quad\forall \; t \in \mathbb{T}, \\[3pt] \sum_{t \in \mathbb{T} }\Big(Y^i_t - \min_{j\in\mathcal I^{-i}} \rho_t\Big(\tilde{g}_{i,j}(t)+Y^j_{t+1}\Big)\Big)\Delta A^i_{t+1}=0. \end{cases} \end{equation}

A solution to the system of RBS $ \Delta$ Es (3.9) is a triple $ (Y,M,A)\in \mathcal L^{\infty,d}_\mathbb G\times \big(\mathcal L_\mathbb{G}^{\infty}\big)^2$ with M a $ \rho_{s,t}$ -martingale and A a $ \mathbb G$ -predictable non-decreasing process.

Remark 3.2. In the special case when the limits $ M_\infty=\lim_{t\to\infty}M_t$ and $ A_\infty=\lim_{t\to\infty}A_t$ exist $ \mathbb{P}$ -almost surely as members of $ L^\infty_{\mathcal G}$ , the infinite-horizon RBS $ \Delta$ E (3.9) can be written

(3.10) \begin{equation} \begin{cases} Y^i_t = \sum_{s=t}^{\infty}\rho_s\Big(g_{i}(s)+\Delta M^i_{s+1}\Big)-\big(M^i_\infty-M^i_t\big) \\[4pt] \qquad -\big(A^i_{\infty}-A^i_t\big),\quad\forall \; t \in \mathbb{T},\\[4pt] Y^i_t \leq \min_{j\in\,\mathcal I^{-i}} \rho_t\Big(\tilde{g}_{i,j}(t)+ Y^j_{t+1}\Big), \quad\forall \; t \in \mathbb{T},\\[4pt] \sum_{t \in \mathbb{T}} \Big(Y^i_t - \min_{j\in\,\mathcal I^{-i}} \rho_t\Big(\tilde{g}_{i,j}(t)+ Y^j_{t+1}\Big)\Big)\Delta A^i_{t+1}=0. \end{cases} \end{equation}

We also emphasise that $ Y \in \mathcal L^{\infty,d}_\mathbb G$ implies the boundary condition $ \lim_{T \to \infty}Y^{i}_{T} = 0$ for all $ i \in \mathcal I$ .

We have the following extension of Theorem 2.2.

Theorem 3.2. The system of RBS $ \Delta$ Es (3.9) has a unique solution. Furthermore, the solution satisfies $ Y=V$ .

Proof. Existence: By Corollary 3.1, the value process V satisfies the following dynamic programming relation for any $ T\in\mathbb T$ :

\begin{align*} V_t^{i}&=\mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_{t,T}}\rho_{t,T}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}(T)+V^{\xi_{T}}_{T+1}\Big), \quad 0 \le t \le T. \end{align*}

Using Theorem 2.2, this implies for every $ T\in\mathbb T$ that (V, M, A) is the unique solution to

\begin{equation*} \begin{cases} V^i_t = V^i_T+\sum_{s=t}^{T-1}\rho_s\big(g_{i}(s)+\Delta M^i_{s+1}\big)-\big( M^i_T- M^i_t\big)-\big( A^i_{T}- A^i_t\big),\\[2pt] \qquad t=0,\ldots,T,\\[2pt] V^i_t \leq \min_{j\in\mathcal I^{-i}} \rho_t\Big(\tilde g_{i,j}(t)+ V^j_{t+1}\Big),\quad t=0,\ldots,T,\\[4pt] \sum_{t=0}^T\Big( V^i_t - \min_{j\in\mathcal I^{-i}} \rho_t\Big(\tilde g_{i,j}(t)+ V^j_{t+1}\Big)\Big)\Delta A^i_{t+1}=0, \end{cases} \end{equation*}

where $ M^{i}_0 = A^{i}_0 = 0$ and

\begin{align*} \left\{\begin{array}{l} \Delta M^{i}_{t+1}=V^i_{t+1}-\rho_t\big(V^i_{t+1}\big), \\[7pt] \Delta A^{i}_{t+1}=\rho_{t}\big(g_i(t)+V^i_{t+1}\big)-V^i_t. \end{array}\right. \end{align*}

Furthermore, since this unique definition for the vector-valued processes M and A is independent of T, it follows that (V, M, A) satisfies Equation (3.9).

By the proof of Theorem 3.1, there exists a decreasing deterministic sequence $ \{K_{t}\}_{t \in \mathbb{T}}$ such that $ \big|V^i_T\big| \le K_{T}$ and $ \lim_{T \to \infty}K_{T} = 0$ . Therefore $ V \in \mathcal L_\mathbb G^{\infty,d}$ and we conclude that (V, M, A) is a solution to (3.9).

Uniqueness: To show uniqueness, we note that if (Y, N, B) is any other solution to (3.9), then by again truncating at time $ T \ge t$ and applying Theorem 2.2 we have that

\begin{align*} Y_t^{i}&=\mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_t}\rho_{t,T}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}(T)+Y^{\xi_{T}}_{T+1}\Big). \end{align*}

Since $ Y, V \in\mathcal L_\mathbb G^{\infty,d}$ and $ \mathcal{I}$ is finite, we can define a deterministic sequence $ \{K_s\}_{s\in\mathbb T}$ with $ K_s\to 0$ as $ s\to\infty$ such that $ \big|V^{i}_t-Y^{i}_t\big|\leq K_t$ for all $ t\in\mathbb T$ and $ i \in \mathcal{I}$ . Appealing once more to the dynamic programming relation, we obtain

\begin{align*} Y_t^{i} & \leq \mathop{\text{ess} \, \text{inf}}\limits_{\xi\in\mathcal U^i_t}\rho_{t,T}\Big(\tilde g_{\xi_{t-1},\xi_t}(t),\ldots,\tilde g_{\xi_{T-1},\xi_{T}}(T)+V^{\xi_{T}}_{T+1}+K_{T+1}\Big) \\ & = V_t^{i} + K_{T+1}, \end{align*}

and similarly we have that $ V_t^{i}\leq Y_t^{i}+K_{T+1}$ . Letting $ T\to\infty$ we find that $ V_t^{i} = Y_t^{i}$ for all $ i \in \mathcal{I}$ , and uniqueness follows.

3.3. Relation to optimal stopping

As an extension to Section 2.3 above, we specialise to the case of optimal stopping on an infinite horizon:

(3.11) \begin{equation} F_t\,:\!=\, \mathop{\text{ess} \, \text{inf}}\limits_{\tau\in\mathscr{T}_{t}}\rho_{t,\tau}(f(t),\ldots,f(\tau-1),h(\tau)). \end{equation}

The above result for infinite-horizon optimal switching problems naturally extends the results in Section 2.3 on optimal stopping in finite horizon to infinite horizon. We have the following.

Corollary 3.2. The value process F satisfies the dynamic programming relation

\begin{align*} F_{t}=\rho_t(f(t)+F_{t+1})\wedge\rho_t(h(t)) \end{align*}

for all $ t\in\mathbb T$ , and an optimal stopping time $ \tau^*_t$ is given by

\begin{align*} \tau^*_t\,:\!=\, \inf\{s\geq t:F_s=\rho_s(h(s))\}. \end{align*}

Furthermore, there exists a $ \rho_{s,t}$ -martingale M and a non-decreasing $ \mathbb G$ -predictable process A such that (F, M, A) is the unique solution to the RBS $ \Delta$ E

(3.12) \begin{equation} \begin{cases} F_t = F_{T} +\sum_{s=t}^{T-1}\rho_s\big(f(s)+\Delta M_{s+1}\big)-\big(M_T-M_t\big) \\ \qquad-(A_{T}-A_t),\quad \forall \; t \in \mathbb{T} \;\text{and}\; T \in \mathbb{T}\; \text{with}\; t \le T,\\ F_t \leq \rho_t(h(t)), \quad\forall \; t \in \mathbb{T}, \\ \sum_{t \in \mathbb{T}}\big(F_t - \rho_t(h(t))\big)\Delta A_{t+1}=0. \end{cases} \end{equation}

Proof. This follows immediately from Theorem 3.1 through the analogy between optimal switching problems and optimal stopping problems described in Section 2.3.

4. Example: delayed or missing observations

In this section we aim to add some colour to the above results by illustrating the interplay between delayed or missing observations and risk awareness. We demonstrate that this issue should be treated differently than in the case of linear expectation; otherwise suboptimal actions may result.

Let $ (\Omega, \mathcal F, \mathbb{F},\mathbb{P})$ be a filtered probability space, and consider either the finite- or the infinite-horizon problem above. Let the process of essentially bounded costs $ (\tilde g_{i,j}(t), i,j \in \mathcal{I})_{t \in \mathbb{T}}$ be adapted (and, in the infinite-horizon case, also satisfying Assumption 3.1), and let $ \rho^\mathbb F$ be an $ \mathbb{F}$ -conditional risk mapping. Suppose that the observation at some time s is delayed. To model this, let $ {\mathbb{G}}$ be the filtration given by

\begin{equation*} {\mathcal G_t} = \begin{cases} \mathcal F_{s-1}, & t = s, \\[2pt] \mathcal F_{t}, & \text{otherwise,} \end{cases} \end{equation*}

and let $ \rho$ be the conditional risk mapping given by

\begin{equation*} {\rho}_t = \begin{cases} \rho^\mathbb F_{s-1}, & t = s, \\[6pt] \rho^\mathbb F_{t}, & \text{otherwise.} \end{cases} \end{equation*}

Indeed, since we will examine the decision taken at time s rather than at later times, the observation at time s may equivalently be missing rather than delayed.

For any time $ t \in \mathbb{T}$ with $ t \neq s$ , the value processes at time t are given by the dynamic programming equations (2.2) or (3.8) and conditional translation invariance,

(4.1) \begin{align} \hat V_t^{i}=\min_{j\in\mathcal I} \Big(\tilde g_{i,j}(t)+ \rho_t\Big(\hat V^{j}_{t+1}\Big)\Big), \end{align}

while the missing observation at time s means that

(4.2) \begin{align} \hat V_s^{i}&=\min_{j\in\mathcal I} \rho_s \Big(\tilde g_{i,j}(s)+ \hat V^{j}_{s+1}\Big), \end{align}
(4.3) \begin{align} \xi_s^i &\in \mathop{\arg\min}\limits_{j \in \mathcal I} \rho_s\Big(\tilde g_{i,j}(s)+\hat V^{j}_{s+1}\Big). \end{align}

When $ \rho^\mathbb F$ is the linear (conditional) expectation, this is equivalent to the following value and choice of mode:

(4.4) \begin{align} \check V_s^{i}=\min_{j\in\mathcal I} \Big({\rho}_s(\tilde g_{i,j}(s))+ {\rho}_s\Big(\hat V^{j}_{s+1}\Big)\Big), \end{align}
(4.5) \begin{align} \check{\xi}_s^i \in \mathop{\arg\min}\limits_{j \in \mathcal I}\Big({\rho}_s(\tilde g_{i,j}(s)) + {\rho}_s\Big(\hat V^{j}_{s+1}\Big)\Big). \end{align}

The intuitively obvious fact that the selections (4.3) and (4.5) may differ can be confirmed by suitably modifying the costs at time s, as follows. For $ f \in m\mathcal{F}$ define

(4.6) \begin{align} \begin{cases}\check C_{i,j}(f) &= {\rho}_s\big(\tilde g_{i,j}(s) \big)+ {\rho}_s\Big(\hat V^{j}_{s+1}\Big) - f, \\[5pt]\hat C_{i,j}(f) &= {\rho}_s\Big(\tilde g_{i,j}(s) + \hat V^{j}_{s+1}\Big) - f.\end{cases}\end{align}

We assume that

(4.7) \begin{align}\check C_{i,j}(0) > \hat C_{i,j}(0) \text{ for each } i,j \in \mathcal I\end{align}

(which is true for example if the risk mapping $ \rho^\mathbb F$ is subadditive), and that for some $ l \in \mathcal I$ we have

(4.8) \begin{equation} \mathbb{P}\big(\check C_{l,1}(0) - \hat C_{l,1}(0) = \check C_{l,2}(0) - \hat C_{l,2}(0) \; \big) < 1,\end{equation}

setting $ l = 1$ without loss of generality.

Remark 4.1. Clearly, these assumptions fail when $ \rho^\mathbb{F}$ is linear (and in the finite-horizon case, they require that $ s < T$ ). They can be understood as ensuring that $ \rho^\mathbb{F}$ is ‘sufficiently nonlinear’ on the problem data. The inequality (4.7) serves to reduce combinatorial complexity.

We argue as follows:

  1. 1. Defining for each $ n \in \mathbb{N}$ and $ i=1,2$ the events

    (4.9) \begin{align}A^i_n = \big\{\omega \in \Omega\,:\, \check C_{1,3-i}(0) - \hat C_{1,3-i}(0) > \check C_{1,i}(0) - \hat C_{1,i}(0) + 2/n\big\},\end{align}
    by the assumption (4.8) at least one of these events ( $ A^1_n$ , say) has positive probability.
  2. 2. Setting $ f_{1,1} = \check C_{1,1}(0) - \check C_{1,2}(0) + 1/n$ , we have

    (4.10) \begin{align} \check C_{1,2}(0) = \check C_{1,1}(f_{1,1}) + 1/n.\end{align}
  3. 3. We now further reduce combinatorial complexity by adjusting costs so that under both selections (4.3) and (4.5), when started in state $ i=1$ at time $ s-1$ , at time s only either remaining in state 1 or switching to mode 2 can be optimal. That is, we would like the following to hold:

    (4.11) \begin{align} \begin{cases}{\arg\min}_{j \in \mathcal I}\big\{\hat C_{1,j}\big(\bar f_{1,j}\big)\big\} \subset \{1,2\}, \\[5pt]{\arg\min}_{j \in \mathcal I}\big\{\check C_{1,j}\big(\bar f_{1,j}\big)\big\} \subset \{1,2\}.\end{cases} \end{align}
    By straightforward linear algebra and (4.7), this can be achieved by taking
    (4.12) \begin{align}\bar f &= 1 + \text{ess} \, \text{sup} \big\{\check C_{1,1}(f_{1,1}), \hat C_{1,1}(f_{1,1}), \check C_{1,2}(0), \hat C_{1,2}(0)\big\} - \mathop{\text{ess} \, \text{inf}}\limits_{k>2} \big\{\check C_{1,k}(0), \hat C_{1,k}(0) \big\} \nonumber\\[3pt] &= 1 + \text{ess} \, \text{sup} \big\{\check C_{1,1}(f_{1,1}), \check C_{1,2}(0)\big\} - \mathop{\text{ess} \, \text{inf}}\limits_{k>2} \big\{\hat C_{1,k}(0)\big\}.\end{align}
  4. 4. Finally, to observe a difference between the selections (4.3) and (4.5), set

    (4.13) \begin{align}\bar f_{1,j} =\begin{cases}\bar f+ f_{1,1}, & j=1, \\[3pt]\bar f, & j=2, \\[3pt]0, & j>2,\end{cases}\end{align}
    since then on $ A^1_n$ we have
    (4.14) \begin{align}\check C_{1,2}\big(\bar f_{1,2}\big) > \check C_{1,1}\big(\bar f_{1,1}\big) > \hat C_{1,1}\big(\bar f_{1,1}\big) > \hat C_{1,2}\big(\bar f_{1,2}\big),\end{align}
    where the first inequality comes from combining (4.6), (4.10), and (4.13), the second from (4.7), and the third from combining (4.9) with (4.10).

Noting that $ \bar f_{1,1}$ and $ \bar f_{1,2}$ are $ \mathcal G_s$ -measurable, we see from (4.6) that the two selections differ if we modify just the two costs $ \tilde g_{1,1}(s)$ and $ \tilde g_{1,2}(s)$ by replacing $ \tilde g_{1,j}(s)$ with $ \tilde g_{1,j}(s) - \bar f_{1,j}$ for $ j \in \{1,2\}$ . In particular, from (4.11) and (4.14), if the system is in mode 1 at time $ s-1$ , then on $ A^1_n$ , (4.5) selects mode 2 at time s while (4.3) selects mode 1 at time s.

5. Example: a hydropower planning problem

In this section we first illustrate the above framework for risk-aware optimal switching under general filtration by formulating a non-Markovian hydropower planning problem (Sections 5.15.4). In Sections 5.55.8 we provide practical dynamic programming equations, an approximate numerical scheme for the problem, a solution algorithm using neural networks, and a discussion of numerical results.

5.1. Decision space and market

Consider a hydropower producer whose interventions take the form of bidding into a market. The producer sells electricity in a daily spot market at noon on the day before delivery. Let $ T=9$ , $ \mathbb T\,:\!=\,\{0,\ldots,T\}$ , and $ \mathbb T^+\,:\!=\,\{0,\ldots,T+1\}$ . Here, $ t \in \mathbb T \cup \{{-}1\}$ represents a decision epoch at hour 12 of day t, where day $ -1$ is the last day of the previous planning period. We assume one-hour planning periods, so that at decision epoch $ t\in \mathbb T$ , the producer hands in a list of bids $ B_{t}\,:\!=\,\big(B^E_{t+1,1},\ldots,B^E_{t+1,24};\,B^P_{t+1,1},\ldots,B^P_{t+1,24}\big)$ , where $ B^E_{t+1,l}$ specifies the quantity of electrical energy offered and $ B^P_{t+1,l}$ the acceptable price for hour l of day $ t+1$ . Just after decision epoch t, the market clears and the prices of electricity are published. If the market price $ R_{t+1,l}$ of electricity for hour l exceeds the producer’s bid price $ B^P_{t+1,l}$ , the producer is obligated to deliver the bidden volume $ B^E_{t+1,l}$ of electrical energy during hour l of day $ t+1$ . For this the producer receives a payment $ R_{t+1,l}B^E_{t+1,l}$ . The total income arising from the bid vector $ B_t$ made at decision epoch t is thus given by

(5.1) \begin{align}\sum_{l=1}^{24}\mathbf{1}_{\big\{B^P_{t+1,l}\leq R_{t+1,l}\big\}}R_{t+1,l}B^E_{t+1,l}.\end{align}

If, on the other hand, a bid is accepted and the reservoir contains insufficient water to deliver the bidden volume, the producer has to purchase the undelivered energy from the balancing power market at a price $ R^F$ , which is usually higher than the spot price. This induces the cost

(5.2) \begin{align}\sum_{l=1}^{24}\mathbf{1}_{\big\{B^P_{t+1,l}\leq R_{t+1,l}\big\}}R^F_{t+1,l}\big(B^E_{t+1,l}-E_{t+1,l}\big)^+\end{align}

of undelivered energy, where $ E_{t,l}$ is the electrical energy produced during hour $ l\in\{1,\ldots,24\}$ of day t.

5.2. Probability space, inflow and price processes

We take $ (\Omega,\mathcal F,\{\mathcal F_t\}_{t \in \mathbb{T}^{+}},\mathbb{P})$ to be a filtered probability space, with $ \mathcal F_t$ representing the information available at noon on day $ t \in \mathbb{T}^{+}$ . This space will be rich enough to support a Markovian price process $(\tilde R_t)_{t\in\mathbb{T}^{+}}$ and a non-Markovian inflow process $ (\tilde I_t)_{t\in\mathbb T^+}$ , as follows.

As is common in electricity planning problems, we assume that the electricity price vector $ (\tilde R_t)_{t\in\mathbb T^+}\,:\!=\,(R_{t,1},\ldots,R_{t,24})_{t\in\mathbb T^+}$ is a bounded Markov process adapted to $ \mathbb F$ . Regarding the inflow process, even under normal conditions, heavy rainfall only leads to increased inflows to a reservoir after a time delay, as the water is filtered through the catchment area surrounding the reservoir. Moreover, the hydropower station may be located in a mountainous area where river flows depend heavily on the melting of snow masses in a spring flood. To model the discrete-time process of inflows $ \{I_{t,j}\}^{1\leq j\leq 24}_{t\in\mathbb T}$ , where $ I_{t,l}$ is the inflow of water from the surroundings during hour l of day t, let $ (H_s)_{s\geq 0}$ be a continuous-time Markov process representing relevant environmental conditions. To account for the dependence of inflows on environmental conditions, set

(5.3) \begin{align}I_{t,j}=\int_{t+(j-1)/24}^{t+j/24}\int_{0}^{\delta}h(s)H_{(r-s)^{+}}dsdr,\end{align}

where $ \delta$ is a constant time lag and h a deterministic function. Then $ (\tilde I_t)_{t\in\mathbb T^+}\,:\!=\,(I_{t-1,13},\ldots,I_{t,12})_{t\in\mathbb T^+}$ is adapted to $ \mathbb F$ and non-Markovian.

5.3. Dynamics of the hydropower system

We assume that the hydropower system consists of one reservoir containing the volume $ M_{t,l}$ at the beginning of hour l of day t and a plant that produces electricity

(5.4) \begin{align}E_{t,l} \,:\!=\, \eta\big(M_{t,l},F_{t,l}\big),\end{align}

where $ F_{t,l}$ is the flow of water directed through the turbines and $ \eta\,:\,\mathbb R^2_ +\to [0,C]$ is a deterministic function describing the efficiency of the plant, with $ C>0$ the installed capacity. We assume that the function $ y \mapsto \eta(m,y)$ is strictly increasing for each fixed m lying between the reservoir minimum level $ M_{\text{min}}$ and maximum $ M_{\text{max}}$ . The process $ M = (M_{t,l})_{t,l}$ of reservoir levels follows the dynamics

(5.5) \begin{align}\nonumber M_{t,l}&=\min\big\{\mathbf{1}_{[l>1]}\big(M_{t,l-1}-F_{t,l-1}+I_{t,l-1}\big) \\ &\quad+\mathbf{1}_{[l=0]}\big(M_{t-1,24}-F_{t-1,24}+I_{t-1,24}\big),M_{\text{max}}\big\},\end{align}

where $ M_{0,13}$ is the volume in the reservoir at the first decision epoch.

Also, as explained in [Reference Lundström, Olofsson and Önskog12], changing the production level by altering the flow $ F_{t,l}$ may necessitate the startup or shutdown of turbines, resulting in both wear and tear and temporarily decreased efficiency. This feature motivates the inclusion of switching costs in the optimisation problem.

5.4. The optimisation problem

The controllable parameters in the problem are the bid vectors $ \{B_t\}_{t\in\mathbb T}$ . With the reasonable assumption that these bids take values in a finite set $ \mathcal I\subset \mathbb R^{48}$ we have a switching problem. Let $ \xi\,:\!=\,(\xi_t)_{t\in\mathbb T}$ denote the switching control, so that $ \xi_t = B_t$ for each $ t \in \mathbb T$ .

By inverting $ \eta$ , from the production plan and the reservoir level we obtain the flow

(5.6) \begin{align} F_{t+1,l}=\min\Big(f\Big(B^E_{t+1,l},M_{t+1,l}\Big)\mathbf{1}_{\big\{B^P_{t+1,l}\leq R_{t+1,l}\big\}}, M_{t+1,l} - M_{\text{min}}\Big).\end{align}

Substituting (5.6) into (5.5) we see that $ M_{t}$ depends both on $ \omega$ and on the entire history of $ \xi$ up to time t. It follows that the switching costs are also dependent on this history. Therefore, recalling (5.1)–(5.6) and letting $ \mathcal I_t \,:\!=\, (\mathcal I)^{t+1}$ , for $ (i_{-1},\ldots,i_{t-1},i_{t}) \in \mathcal I_{t+1}$ we may define the rewards for the planning problem as

(5.7) \begin{equation}\begin{split}& \tilde g_{i_{-1:t-1},i_{-1:t}}(t) \,:\!=\, {} -c_{i_{-1:t}}\big(\tilde R_{t+1}\big)+\sum_{l=1}^{24}\mathbf{1}_{\big\{i_{t,24+l}\leq R_{t+1,l}\big\}}\big(R_{t+1,l}i_{t,l}\\& \qquad - R^F_{t+1,l}\big(i_{t,l}-\eta\big(M^{i_{-1:t}}_{t+1,l},\min\big(f(i_{t,l},M^{i_{-1:t}}_{t+1,l}),M^{i_{-1:t}}_{t+1,l} - M_{\text{min}}\big)\big)\big)^+\big) \\& \qquad + \mathbf{1}_{\{t=T\}}R^M M^{i_{-1:T}}_{T+2,1},\end{split}\end{equation}

where

  • $ i_{-1:t} \,:\!=\, (i_{-1},\ldots,i_{t-1},i_{t})$ ;

  • $ M^{i_{-1:t}}_{t+1,l}$ is the reservoir level at hour l on day $ t+1$ corresponding to the bid history $ i_{-1:t} \in \mathcal I_{t+1}$ ;

  • $ i_{t,m}$ is the mth component of $ i_t$ ;

  • $ R^M$ is the value of water stored at the end of the planning period;

  • and for each $ r\in\mathbb R^{24}_+$ , $ c_{i_{-1:t}}(r)$ is the cost rendered by switching from bid $ i_{t-1}$ to $ i_t$ when the price vector is r and the bid history is $ i_{-1:t}$ .

If the producer has risk mapping $ \rho$ , then for each $ t \in \mathbb{T}$ , given a bid history $ i_{-1:t-1} \in \mathcal I_{t}$ , the objective is to find

(5.8) \begin{equation}V_t^{i_{-1:t-1}} \,:\!=\, \mathop{\text{ess} \, \text{sup}}\limits_{\xi \in\mathcal U^{i_{-1:t-1}}_t}\rho_{t,T}\big(\tilde g_{\xi_{-1:t-1},\xi_{-1:t}}(t),\tilde g_{\xi_{-1:t},\xi_{-1:t+1}}(t+1),\ldots,\tilde g_{\xi_{-1:T-1},\xi_{-1:T}}(T)\big),\end{equation}

where $ \mathcal U^{i_{-1:t-1}}_t$ is the set of $ \mathbb F$ -adapted, $ \mathcal I$ -valued processes $ (\xi_s)_{s \in\mathbb T}$ such that $ \xi_{-1:t-1} = i_{-1:t-1}$ . Note that the reward $ \tilde g_{i_{-1:t-1},i_{-1:t}}(t)$ is $ \mathcal F_{t+1}$ -measurable but not $ \mathcal F_{t}$ -measurable. The producer’s problem is thus one of non-adapted (in this case, delayed) information.

5.5. Dynamic programming equations

By modifying the proof of Theorem 2.1 accordingly we can show that the value processes $ (V_t^{i_{-1:t-1}} \colon i_{-1:t-1} \in \mathcal I_t)_{t\in\mathbb T}$ corresponding to (5.8) satisfy the following analogue of (2.2):

(5.9) \begin{equation} \begin{cases} V_T^{i_{-1:T-1}} = \max_{j\in \mathcal I} \rho_T\big(\tilde g_{i_{-1:T-1},(i_{-1:T-1},j)}(T)\big),& {}\\[5pt] V_t^{i_{-1:t-1}} = \max_{j\in \mathcal I} \rho_t\Big(\tilde g_{i_{-1:t-1},(i_{-1:t-1},j)}(t)+ V^{(i_{-1:t-1},j)}_{t+1}\Big),& \text{for} \: 0\leq t<T, \end{cases}\end{equation}

where for $ i_{-1:t-1} \in \mathcal I_t$ and $ j \in \mathcal I$ we define $ (i_{-1:t-1},j) = (i_{-1},\ldots,i_{t-1},j)$ . In order to obtain a practical solution algorithm we observe that the same optimal control can be obtained by dynamic programming without requiring the entire bid history. Recalling (5.7), given $ \omega \in \Omega$ , for $ (i_{-1},\ldots,i_{t-1},i_{t}) \in \mathcal I_{t+1}$ the cost $ \tilde g_{i_{-1:t-1},i_{-1:t}}(t)$ depends on $ i_{-1:t-1}$ only through its final bid vector $ i_{t-1}$ and the reservoir level $ M^{i_{-1:t}}_{t+1,1}$ . Moreover, by (5.5) and (5.6), $ M^{i_{-1:t}}_{t+1,1}$ depends on $ i_{-1:t-1}$ only through $ M_{t,13}^{i_{-1:t-1}}$ and the final bid vector $ i_{t-1}$ . Thus for $ i_{-1:t} \in \mathcal I_{t+1}$ and $m \in [M_{\min},M_{\max}]$ we may define new (random) rewards $ \tilde{g}_{i_{t-1},i_{t}}(t, m)$ such that

(5.10) \begin{equation}\begin{split} &\tilde g_{i_{t-1},i_{t}}(t,m) \,:\!=\, {} -c_{i_{-1:t}}(R_{t+1})+\sum_{l=1}^{24}\mathbf{1}_{\big\{i_{t,24+l}\leq R_{t+1,l}\big\}}\big(R_{t+1,l}i_{t,l} \\ & \qquad - R^F_{t+1,l}\Big(i_{t,l}-\eta\Big(M^{m,i_{t}}_{t+1,l},\min\Big(f\Big(i_{t,l},M^{m,i_{t}}_{t+1,l}\Big),M^{m,i_{t}}_{t+1,l}-M_{\text{min}}\Big)\Big)\Big)^+\Big) \\ & \qquad + \mathbf{1}_{\{t=T\}}R^M M^{i_{-1:T}}_{T+2,1},\end{split}\end{equation}

where $ M^{m,i_{t}}_{t+1}$ is the vector of reservoir levels on day $ t+1$ given that on day t the reservoir was at level m at the beginning of hour 13 (i.e. at noon) and the bid vector was $ i_{t}$ . That is, $ \tilde{g}_{i_{t-1},i_{t}}(t, m)$ and $ \tilde g_{i_{-1:t-1},i_{-1:t}}(t)$ coincide when $ M_{t,13}^{i_{-1:t-1}} = m$ . Then define auxiliary value processes by

(5.11) \begin{equation}V_t^{i}(m) =\begin{cases}\max_{j\in \mathcal I} \rho_T\big(\tilde g_{i,j}(T, m)\big),& {}\\[4pt]\max_{j\in \mathcal I} \rho_t\Big(\tilde g_{i,j}(t, m) + V_{t+1}^{j}\Big(M^{m,j}_{t+1,13}\Big)\Big) ,& \text{for} \: 0\leq t<T.\end{cases}\end{equation}

By construction we have $ V_t^{i_{t-1}}\big(M_{t,13}^{i_{-1:t-1}}\big) = V_t^{i_{-1:t-1}}$ ; this can be confirmed by backward induction. Therefore, if the auxiliary value function $ V_{t+1}^{j}(m)$ can be computed for each $ j \in \mathcal I$ and $m \in [M_{\min},M_{\max}]$ , then (5.9) and (5.11) provide equivalent dynamic programming equations over the set of modes $ \mathcal I$ . The benefit of (5.11) is that we do not need to remember the switching control’s entire history. Note that this reformulation is non-Markovian since $ (M_t)_{t \in \mathbb{T}}$ is not a Markov process. In the next section we present a numerical approximation to this scheme using neural networks.

5.6. Numerical scheme

Let $ \eta(M,F)=\eta_0MF$ with $ \eta_0=0.1$ and $ R_{t,l}=\big(1+|\sin(l\pi/12)|\big)\big(0\vee\tilde R_{t+(l-1)/24}\wedge C_R\big)$ , where the multiplicative coefficient models the daily trend, $ C_R=4$ is a price ceiling, and $ \tilde R$ solves the stochastic difference equation

\begin{align*}\tilde R_{t+1}-\tilde R_{t}=0.02\big(1-\tilde R_t\big)+0.05N_t,\end{align*}

where $ (N_t)_{t \in \mathbb{T}}$ are standard normal random variables.

For the processes I and H of (5.3) we take $ \delta=2/24$ , $ h(s)\,:\!=\,\sin(s\pi/\delta)$ , and H to be a pure jump Markov process taking values in $ \{0,0.5,1\}$ with transition intensity matrix

\begin{align*}Q_H\,:\!=\,\left[\begin{array}{c@{\quad}c@{\quad}c} -1 & 0.5 & 0.5\\[3pt] 1 & -2 & 1\\[3pt] 2 & 0.5 & -2.5\end{array}\right],\end{align*}

representing no, medium, and heavy rainfall respectively. For numerical purposes we approximate H by a discrete-time Markov chain updating k times per hour, with transition matrix $ \exp\left(\frac{1}{24k}Q_H\right)$ .

Moreover, let $ \tilde I$ be a discretisation of the set $ [0,2]^{24}\times[0,4]^{24}$ (representing the fact that market bids have limited precision, for example 1 MWh and 0.01 euro), and let the hydropower producer’s risk aversion be modelled by an entropic risk measure, i.e.

\begin{align*}\rho_t(X)=-\frac{1}{\theta}\log\left(\mathbb E\big[e^{-\theta X}\big|\mathcal F_t\big]\right),\end{align*}

with parameter $ \theta>0$ . Finally, we assume that changes in production level cost $ 0.1$ Euro per MW and set $ R^F=10$ , $ R^M\,:\!=\,4$ , $ M_{\text{min}} = 10$ , $ M_{\text{max}} = 50$ , and $ k=2$ .

5.6.1. State-space description

To obtain a state-space description of our problem we introduce the state $ (x_t)_{t\in\mathbb T}$ , where $ x_t$ is the non-redundant information available at hand at noon on day t; that is,

(5.12) \begin{align}x_t\,:\!=\,\left[\begin{array}{c} M_{t,13}\\[3pt] R_{t,24}\\[3pt] \{P_{t,j}\}_{13\leq j\leq 24}\\[3pt] \Big\{H^k_{t+1/2-l/24k}\Big\}_{l\in\big\{0,\ldots,24\delta k\big\}}\end{array}\right],\end{align}

where $ \{P_{t,j}\}_{13\leq j\leq 24}$ is the $ \mathcal F_t$ -measurable production schedule for the hours between noon and midnight of day t. In particular, the state contains the discretised weather trajectory for the past two hours (10 a.m. to noon), since, according to (5.3), the impact of precipitation is only fully revealed after this delay. Recalling the notation $ \xi^*$ of Theorem 2.1 for an optimal strategy, from Section 5.5 the optimal mode (bid vector) $ \xi_t^*$ depends on its previous value $ \xi_{t-1}^*$ only through the production schedule $ \{P_{t,j}\}_{13\leq j\leq 24}$ , so we may write $ \xi_t^* = \xi_t^*(x_t)$ .

It follows from Equations (5.5)–(5.7) and (5.3) that given the system state $ x_t$ and bid vector $ j=B_t$ at noon on day t, both the reward $ \tilde{g}_{i,j}(t)$ and the new state $ x_{t+1}$ are measurable with respect to the noise vector $ w_t$ , where

\begin{align*}w_t\,:\!=\,\left[\begin{array}{c} \big\{R_{t+1,j}\big\}_{1 \leq j \leq 24}\\[7pt] \Big\{H^k_{t+1/2-l/24k}\Big\}_{l\in\big\{0,\ldots,24\delta k\big\}}\end{array}\right],\end{align*}

which is not $ \mathcal F_t$ -measurable.

5.7. Algorithm

In this section we describe an implementation of the numerical scheme of Section 5.6. Code implementing this scheme, and also a risk-neutral scheme, is available at https://github.com/moriartyjm/optimalswitching/tree/main/hydro and is described in Algorithm 1. For practicality it employs the neural networks shown in Figures 1 and 2.

Figure 1: Architecture of the bid neural network. Nodes $ d_1$ and $ d_2$ represent dense layers with sigmoid activation function.

Figure 2: Architecture of the value neural network. To reduce dimension, in the state vector $ x_t$ the production schedule $ \{P_{t,j}\}_{13\leq j\leq 24}$ is replaced by the sum of its entries. Nodes $ d_3$ and $ d_4$ represent dense layers with sigmoid activation function.

The bid neural network, whose architecture is given in Figure 1, aims to solve the following optimisation problem:

(5.13) \begin{align}\begin{cases}\xi^*_T(x_{T}) \in \text{arg} \, \text{max}_{j\in\mathcal I} \left\{{-}\frac{1}{\theta}\log\left(E^{x_{T}}_{T}\big[e^{-\theta \tilde g_{i,j}(T)}\big] \right)\right\},& {}\\[7pt]\xi^*_t(x_t) \in \text{arg} \, \text{max}_{j\in\mathcal I} \left\{{-}\frac{1}{\theta}\log\left(E_{t}^{x_{t}}\bigg[e^{-\theta \big(\tilde g_{i,j}(t) + \hat{V}^{j}_{t+1}(x_{t+1})\big)}\bigg] \right)\right\},& \text{for} \: 0\leq t<T,\end{cases}\end{align}

where $ x_{t} \mapsto E_{t}^{x_{t}}$ approximates the conditional expectation with respect to $ \mathcal F_t$ using the state vector, and $ \hat{V}^{j}_{t+1}(x_{t+1})$ approximates the continuation value using the value neural network, whose architecture is given in Figure 2. Continuation values $ \hat{V}^{j}_{T+1}(x_{T+1})$ are set equal to zero. Note that these equations do not simplify further since the rewards $ \tilde g_{ij}(t)$ are non-adapted.

The optimisation is performed by first training the bid neural network on M independent noise realisations with target values equal to zero and loss function equal to

\begin{equation*}-\frac{1}{\theta}\log\left(\frac{1}{M}\sum_{\ell=1}^{M}\Bigg[e^{-\theta \bigg(\tilde g_{i,\xi^*_t\big(x^{\ell}_{t}\big)}(t) + \hat{V}^{j}_{t+1}\big(x^{\ell}_{t+1}\big)\bigg)}\Bigg]\right),\end{equation*}

where $ x^{\ell}_{t}$ denotes the state vector $ x_t$ under the $ \ell$ th noise realisation. (Note that since the state $ x^{\ell}_{t}$ contains the production schedule $ \big\{P^\ell_{t,j}\big\}_{13\leq j\leq 24}$ , it also depends on the bid vector submitted at time $ t-1$ ; we omit this dependency in order to lighten the notation.) After the bid neural network has been trained, the value neural network is trained on the M independent noise realisations with target values equal to $ \exp\Big({-}\theta \Big(\tilde g_{i,\xi^*_t\big(x_t^\ell\big)}(t) + \hat{V}^{j}_{t+1}\big(x_{t+1}^\ell\big)\Big)\Big)$ and the mean squared error as the loss function. Initial reservoir levels $ M_{0,13}$ are drawn uniformly at random between $ M_{\text{min}}$ and $ M_{\text{max}}$ , while initial market prices $ R_{0,24}$ and weather values $\Big\{H^k_{t+1/2-l/24k}\Big\}_{l\in\big\{0,\ldots,24\delta k\big\}}$ are drawn from the corresponding stationary distribution.

5.8. Numerical results and discussion

In this section we present and discuss numerical results obtained using Algorithm 1 over an optimisation horizon of 10 days and with 50,000 independent noise realisations. Identifying the risk-neutral case with $ \theta = 0$ , we plot results for $ \theta$ equal to 0, 0.01, and $ 0.02$ in blue (solid), orange (dashed), and green (dotted), respectively.

Figure 3: Average production curves by hour for risk sensitivity $ \theta=0, \, 0.01, \, 0.02$ (blue solid, orange dashed, and green dotted lines, respectively).

Figure 4: Plots for risk sensitivity $ \theta=0, 0.01, 0.02$ of the reservoir level $ M_t$ by hour: mean (thick blue solid, orange dashed, and green dotted lines, respectively) and 0.05 percentile (thinner lines).

For each hour in the optimisation, Figure 3 shows the production level under the respective optimal strategies, averaged across all noise realisations. Similarly, Figure 4 plots the mean water level under the optimal strategies, together with the 0.05 percentiles (dashed lines). In order to represent the value processes, Figure 5 plots the prediction $ \hat{V}_0(x_0)$ made by the value neural network when the input is $ x_0=\left[m, 0, \mathbf{0}, \mathbf{0}\right]^{{\small\textsf{T}}}$ , for different values of m (recall (5.12); $ \top$ denotes transpose).

The reservoir’s physical constraints $ M_{\text{min}}$ and $ M_{\text{max}}$ create risks for the hydropower producer. When the reservoir level is near $ M_{\text{min}}$ the producer risks being unable to fulfil the bid volume and receiving a penalty for under-production. Conversely, if the reservoir reaches its maximum level $ M_{\text{max}}$ then she risks spilling the water inflow, which would otherwise be stored and used profitably later.

Figure 5: Predictions made by the $ t=0$ value neural network for $ \theta=0, 0.01, 0.02$ (blue solid, orange dashed, and green dotted lines, respectively) for the input $ x_0=\left[m, 0, \mathbf{0}, \mathbf{0}\right]^{{\small\textsf{T}}}$ , for different initial reservoir levels m.

From Figure 4, the risk-neutral producer maintains the reservoir at an intermediate water level on average. Furthermore, in at least 5% of cases she allows the water level to fall rather close to the minimum level. In contrast, in at least 95% of cases the optimal strategy of the risk-averse producer first drives the initial water level up by trading less, and production increases only once the reservoir is at least approximately half filled. Indeed, for $ \theta=0.02$ the average water level is seen to increase towards $ M_{\text{max}}$ over the time horizon. Thus increases in $ \theta$ incentivise the producer to avoid the risk of under-production penalties. (The risk of spilling water at level $ M_{\text{max}}$ appears to have comparatively less influence on the optimal strategies.)

These observations are also borne out in Figure 5. In the risk-neutral case, the marginal value of water is approximately constant as the water level varies. However, locally around $ M_{\text{min}}$ , where the risk of penalties has more influence, the marginal value of water becomes lower as the risk sensitivity parameter $ \theta$ increases.

Figure 3 confirms that the risk-neutral strategy involves producing every day, and also involves following the daily price trend within each day. As the risk aversion parameter $ \theta$ increases, the number of production days, and also the total produced volume, decrease.

Appendix A. Properties of conditional risk mappings

Here we review definitions and preliminary results on conditional risk mappings that are used in the main text. References for this material include [Reference Frittelli and Rosazza Gianin9, Reference Detlefsen and Scandolo7, Reference Pflug and Pichler13, Reference Shapiro, Dentcheva and Ruszczynski19, Reference Ruszczyński and Shapiro18, Reference Cheridito and Kupper5, Reference Ruszczyński17, Reference Follmer and Schied8], among many others. Proofs are provided for results if they are not readily available in these references.

We are given a probability space $ (\Omega,\mathcal{F},\mathbb{P})$ and a filtration $ \mathbb{G} = \{\mathcal{G}_{t}\}_{t \in \mathbb{T}}$ of sub- $ \sigma$ -algebras of $ \mathcal{F}$ . All random variables below are defined with respect to this probability space, and (in-) equalities between random variables are in the $ \mathbb{P}$ -almost-sure sense.

A.1. Conditional risk mappings

A $ \mathbb{G}$ -conditional risk mapping is a family of mappings $ \{\rho_{t}\}_{t \in \mathbb{T}}$ , $ \rho_{t} \colon L^{\infty}_{\mathcal{F}} \to L^{\infty}_{\mathcal{G}_{t}}$ , satisfying the following for all $ t \in \mathbb{T}$ :

Normalisation: $ \rho_{t}(0) = 0$ .

Conditional translation invariance: for all $ W \in L^{\infty}_{\mathcal{F}}$ and $ Z \in L^{\infty}_{\mathcal{G}_{t}}$ ,

\begin{equation*} \rho_{t}(Z + W) = Z + \rho_{t}(W). \end{equation*}

Monotonicity: for all $ W,Z \in L^{\infty}_{\mathcal{F}}$ ,

\begin{equation*} W \le Z \implies \rho_{t}(W) \le \rho_{t}(Z). \end{equation*}

For each $ t \in \mathbb{T}$ we refer to $ \rho_{t}$ as a conditional risk mapping. Note that in contrast to the one-step conditional risk mappings $ \rho_t$ of [Reference Ruszczyński17], whose respective domains would be $ L^{\infty}_{\mathcal{G}_{t+1}}$ in this context, here the domain of each $ \rho_t$ is $ L^{\infty}_{\mathcal{F}}$ . Conditional risk mappings and the monetary conditional risk mappings of [Reference Follmer and Schied8] are interchangeable via the mapping $ Z \mapsto \rho_{t}({-}Z)$ . Each $ \mathbb{G}$ -conditional risk mapping satisfies the following property (cf. [Reference Cheridito, Delbaen and Kupper4, Proposition 3.3], [Reference Follmer and Schied8, Exercise 11.1.2]):

Conditional locality: for every W and Z in $ L^{\infty}_{\mathcal{F}}$ , $ t \in \mathbb{T}$ , and $ A \in \mathcal{G}_{t}$ ,

\begin{equation*} \rho_{t}(\mathbf{1}_{A}W + \mathbf{1}_{A^{c}}Z) = \mathbf{1}_{A}\rho_{t}(W) + \mathbf{1}_{A^{c}}\rho_{t}(Z). \end{equation*}

A $ \mathbb{G}$ -conditional risk mapping is said to be strongly sensitive if it satisfies the following:

Strong sensitivity: for all $ W,Z \in L^{\infty}_{\mathcal{F}}$ and $ t \in \mathbb{T}$ ,

\begin{equation*} W \le Z \;\;\text{and}\;\; \rho_{t}(W) = \rho_{t}(Z) \iff W = Z. \end{equation*}

The strong sensitivity and monotonicity properties are sometimes jointly called the strict (or strong) monotonicity property.

A.2. Aggregated conditional risk mappings

A.2.1. Finite horizon

Where it simplifies notation we will write $ W_{s:t} = (W_{s},\ldots,W_{t})$ for tuples of length $ t-s+1$ , with $ W_{s:s} = W_{s}$ , and use the componentwise partial order $ W_{s:t} \le W^{\prime}_{s:t} \iff W_{r} \le W^{\prime}_{r},\; r = s,\ldots,t$ . If $ \alpha$ and $ \beta$ are real-valued random variables then we write $ \alpha W_{s:t} + \beta Z_{s:t} = (\alpha W_{s} + \beta Z_{s},\ldots,\alpha W_{t} + \beta Z_{t})$ .

Lemma A.1. The aggregated risk mapping $ \{\rho_{s,t}\}$ has the following properties: for all $ s,t \in \mathbb{T}$ with $ s \le t$ ,

Normalisation: $ \rho_{s,t}(0,\ldots,0) = 0$ .

Conditional translation invariance: for all $ \{W_{r}\}_{r = s}^{t} \in \otimes^{t-s+1} L^{\infty}_{\mathcal{F}}$ with $ W_{s} \in \mathcal{G}_{s}$ ,

\begin{equation*} \rho_{s,t}(W_{s},\ldots,W_{t}) = W_{s} + \rho_{s,t}(0,W_{s+1},\ldots,W_{t}). \end{equation*}

Monotonicity: for all $ \{W_{r}\}_{r = s}^{t}, \{Z_{r}\}_{r = s}^{t} \in \otimes^{t-s+1} L^{\infty}_{\mathcal{F}}$ ,

\begin{equation*} W_{s:t} \le Z_{s:t} \implies \rho_{s,t}(W_{s:t}) \le \rho_{s,t}(Z_{s:t}). \end{equation*}

Conditional locality: for all $ \{W_{r}\}_{r = s}^{t}$ and $ \{Z_{r}\}_{r = s}^{t}$ in $ \otimes^{t-s+1} L^{\infty}_{\mathcal{F}}$ ,

\begin{equation*} \rho_{s,t}(\mathbf{1}_{A}W_{s:t}+\mathbf{1}_{A^{c}}Z_{s:t}) = \mathbf{1}_{A}\rho_{s,t}(W_{s:t}) + \mathbf{1}_{A^{c}}\rho_{s,t}(Z_{s:t}), \;\; \forall\,A \in \mathcal{G}_{s}. \end{equation*}

Recursivity: for all $ s,r,t \in \mathbb{T}$ with $ 0 \le s < r \le t$ ,

\begin{equation*} \rho_{s,t}(W_{s:t}) = \rho_{s,r}(W_{s:r-1},\rho_{r,t}(W_{r:t})). \end{equation*}

Proof. The proof follows from expanding the recursive definition of $ \rho_{s,t}$ and using the properties of its generator.

A.2.2. Infinite horizon

Lemma A.2. Lemma A.2. Recalling Definition 3.1, for all $ W\in H_\mathcal F$ we have

\begin{equation*} \varrho_{s}\big(W_{s},W_{s+1},\ldots\big) = \lim_{t \to \infty}\rho_{s,t}\big(W_{s},\ldots,W_{t}\big) \quad \forall s \in \mathbb{T}. \end{equation*}

Proof. Let $ W\in H_\mathcal F$ and $ \{k_{t}\}_{t \in \mathbb{T}}$ be as in the definition of $ H_\mathcal{F}$ . Set $ K_{t} \,:\!=\, \sum_{n \ge 0}k_{t+n}$ . Note that $ \{K_{t}\}_{t \in \mathbb{T}}$ is a non-negative, non-increasing deterministic sequence such that $ \lim_{t \to \infty}K_{t} = 0$ . For every $ 0 \le s \le t$ and $ n \ge 1$ ,

\begin{align*} \rho_{s,t+n}\big(W_{s},\ldots,W_{t+n}\big) & = \rho_{s,t+1}\big(W_{s},\ldots,W_{t},\rho_{t+1,t+n}\big(W_{t+1},\ldots,W_{t+n}\big)\big) \\ & \le \rho_{s,t+1}\left(W_{s},\ldots,W_{t},\sum_{m=1}^{n}k_{t+m}\right) \\ & = \rho_{s,t}(W_{s},\ldots,W_{t}) + \sum_{m=1}^{n}k_{t+m}. \end{align*}

Similarly we have

\begin{equation*} \rho_{s,t+n}(W_{s},\ldots,W_{t+n}) \ge \rho_{s,t}(W_{s},\ldots,W_{t}) - \sum_{m=1}^{n}k_{t+m}, \end{equation*}

and we conclude that $ \mathbb{P}$ -almost surely, the sequence $ \{\rho_{s,t}(W_{s},\ldots,W_{t})\}_{t \in \mathbb{T}}$ is Cauchy.

Lemma A.3. For all $ W\in H_\mathcal F$ we have

\begin{equation*} \varrho_{s}(W_{s},W_{s+1},\ldots) = \rho_{s,s+1}(W_{s}, \varrho_{s+1}(W_{s+1},W_{s+2},\ldots)). \end{equation*}

Proof. Arguing as in the proof of Lemma A.2, there is a deterministic positive sequence $ \{K_t\}_{t\in\mathbb T}$ , with $ \lim_{t \to \infty} K_t = 0$ , such that for every $ 0 \le s \le t$ we have

\begin{align*} |\varrho_{s+1}(W_{s+1},W_{s+2},\ldots)-\rho_{s+1,t}(W_{s+1},\ldots,W_t)|\leq K_t \quad \text{almost surely.} \end{align*}

The monotonicity and conditional translation invariance of $ \rho_{s+1,t}$ imply that

\begin{align*} \rho_{s,s+1}(W_{s}, \varrho_{s+1}(W_{s+1},W_{s+2},\ldots))&\leq\rho_{s,s+1}(W_{s}, \rho_{s+1,t}(W_{s+1},\ldots,W_t)+K_t) \\ &=\rho_{s,t}(W_{s},W_{s+1},\ldots,W_t)+K_t. \end{align*}

Taking the limit as $ t\to\infty$ we find that

\begin{align*} \rho_{s,s+1}(W_{s}, \varrho_{s+1}(W_{s+1},W_{s+2},\ldots))\leq \varrho_{s}(W_{s}, W_{s+1},\ldots). \end{align*}

A similar argument can be applied to find the reverse inequality.

All of the properties in Lemma A.1 for finite sequences extend to infinite sequences in $ H_\mathcal{F}$ with $ \varrho_{s}$ playing the role of $ \rho_{s,\infty}$ .

A.3. Martingales for aggregated conditional risk mappings

We close by presenting elementary martingale theory for aggregated conditional risk mappings (see also [Reference Follmer and Schied8, Reference Krätschmer and Schoenmakers11]).

Let $ f = \{f_{t}\}_{t \in \mathbb{T}}$ be a sequence in $ L^{\infty}_{\mathcal{F}}$ . We say that $ W \in \mathcal L_\mathbb{G}^{\infty}$ is an f-extended $ \{\rho_{s,t}\}$ -submartingale (-supermartingale) if

\begin{equation*} W_{s} \le ({\ge})\, \rho_{s,t}\big(f_{s},\ldots,f_{t-1},W_{t}\big), \quad 0 \le s \le t, \end{equation*}

and an f-extended $ \{\rho_{s,t}\}$ martingale if it has both these properties. Note that we use the convention

\begin{equation*} \rho_{s,t}\big(f_{s},\ldots,f_{t-1},W_{t}\big) = \rho_{t,t}(W_{t}) \;\; \text{if}\;\; s = t. \end{equation*}

If $ f \equiv 0$ then the qualifier ‘f-extended’ is omitted.

Lemma A.4. The definitive property for an f-extended $ \{\rho_{s,t}\}$ -submartingale (-supermartingale) W is equivalent to the one-step property,

\begin{equation*} W_{t} \le ({\ge})\, \rho_{t,t+1}(f_{t},W_{t+1}),\;\; t \in \mathbb{T}. \end{equation*}

Proof. If $ \{W_{t}\}_{t \in \mathbb{T}}$ is a one-step f-extended $ \{\rho_{s,t}\}$ -submartingale, then for all $ s,t \in \mathbb{T}$ such that $ s < t$ we have

\begin{align*} \rho_{s,t}\big(\,f_{s},\ldots,f_{t-1},W_{t}\big) & = \rho_{s,t-1}\big(\,f_{s},\ldots,f_{t-2},\rho_{t-1,t}(\,f_{t-1},W_{t})\big) \\ & \ge \rho_{s,t-1}\big(\,f_{s},\ldots,f_{t-2},W_{t-1}\big) \ldots \ge W_{s}. \end{align*}

The case $ s = t$ and the converse implication that an f-extended $ \{\rho_{s,t}\}$ -submartingale satisfies the one-step property are both trivial and thus omitted.

Lemma A.5. (Doob decomposition) Let $ W \in \mathcal L_\mathbb{G}^{\infty}$ . There exists an almost surely unique $ \{\rho_{s,t}\}$ -martingale M and $ \mathbb{G}$ -predictable process A such that $ M_{0} = A_{0}$ and

(A.1) \begin{equation} W_{t} = W_{0} + M_{t} + A_{t}. \end{equation}

The processes A and M are defined recursively as follows:

\begin{align*} & \begin{cases} A_{0} = 0,\\[2pt] A_{t+1} = A_{t} + \big(\rho_{t}\big(W_{t+1}\big) - W_{t}\big), \;\; t \in \mathbb{T}, \end{cases} \\[4pt] & \begin{cases} M_{0} = 0,\\[2pt] M_{t+1} = M_{t} + \big(W_{t+1} - \rho_{t}\big(W_{t+1}\big)\big), \;\; t \in \mathbb{T}. \end{cases} \end{align*}

If W is a $ \{\rho_{s,t}\}$ -submartingale (-supermartingale) then A is increasing (decreasing).

Proof. This is proved in the same way as Lemma 5.1 of [Reference Krätschmer and Schoenmakers11].

A.3.1. Optional stopping properties

First let $ \tau \in \mathscr{T}\,$ be a stopping time. For sequences $ \{f_t\}_{t \in \mathbb{T}}$ and $ \{W_t\}_{t \in \mathbb{T}}$ in $ H_\mathcal F$ , define the aggregated cost $ \rho_{t,\tau}(f_t,\ldots,f_{\tau-1},W_{\tau})$ as

(A.2) \begin{equation} \rho_{t,\tau}(f_t,\ldots,f_{\tau-1},W_{\tau}) = \begin{cases} 0,& \text{on} \kern.5em \{\tau < t\},\\ \rho_{t}(W_{t}),& \text{on} \kern.5em \{\tau = t\},\\ \rho_{t}\big(f_t+\rho_{t+1,\tau}(f_{t+1}),\ldots,f_{\tau-1},W_{\tau})\big),& \text{on} \kern.5em \{\tau > t\}. \end{cases} \end{equation}

Given another stopping time $ \varsigma \in \mathscr{T}$ , define the aggregated cost

$ \rho_{\varsigma,\tau}(f_{\varsigma},\ldots,f_{\tau-1},W_{\tau})$ as

(A.3) \begin{equation} \begin{split} \rho_{\varsigma,\tau}(f_{\varsigma},\ldots,f_{\tau-1},W_{\tau}) & = \sum_{t \in \mathbb{T}}\mathbf{1}_{\{\varsigma = t\}}\rho_{t,\tau}(f_{t},\ldots,f_{\tau-1},W_{\tau}) \\[2pt] & = \begin{cases} 0,& \text{on} \kern.5em \{\tau < \varsigma\},\\[2pt] \rho_{\varsigma}(W_{\varsigma}),& \text{on} \kern.5em \{\tau = \varsigma\},\\[2pt] \rho_{\varsigma}\big(f_{\varsigma}+\rho_{\varsigma+1,\tau}(f_{\varsigma+1},\ldots,f_{\tau-1},W_{\tau})\big),& \text{on} \kern.5em \{\tau > \varsigma\}. \end{cases} \end{split} \end{equation}

Without loss of generality we can assume $ \tau \ge t$ and $ \tau \ge \varsigma$ in (A.2) and (A.3) respectively. The following lemma shows that the recursive property of aggregated conditional risk mappings extends to stopping times.

Lemma A.6. If $ \varsigma$ , $ \tilde{\varsigma}$ , and $ \tau$ are bounded stopping times in $ \mathscr{T}$ such that $ \varsigma \le \tilde{\varsigma} \le \tau$ , then for all sequences $ \{f_{t}\}_{t \in \mathbb{T}}$ and $ \{W_{t}\}_{t \in \mathbb{T}}$ in $ \mathcal L^\infty_{\mathcal{F}}$ we have

\begin{equation*} \rho_{\varsigma,\tau}\big(f_{\varsigma},\ldots,f_{\tau-1},W_{\tau}\big) = \rho_{\varsigma,\tilde{\varsigma}}\big(f_{\varsigma},\ldots,f_{\tilde{\varsigma} - 1},\rho_{\tilde{\varsigma},\tau}\big(f_{\tilde{\varsigma}},\ldots,f_{\tau-1},W_{\tau}\big)\big). \end{equation*}

Proof. Since $ \tau$ is bounded it follows that $ \tau \in \mathscr{T}_{[0,T]}$ for some integer $ 0 < T < \infty$ . Furthermore, by (A.3) it suffices to prove for all $ 0 \le t \le T$ that

(A.4) \begin{align} \mathbf{1}_{\{\tilde{\varsigma} \ge t\}}\rho_{t,\tilde{\varsigma}}\big(f_{t},\ldots,f_{\tilde{\varsigma} - 1},\rho_{\tilde{\varsigma},\tau}\big(f_{\tilde{\varsigma}},\ldots,f_{\tau - 1},W_{\tau}\big)\big) = \mathbf{1}_{\{\tilde{\varsigma} \ge t\}}\rho_{t,\tau}\big(f_{t},\ldots,f_{\tau-1},W_{\tau}\big). \end{align}

By decomposing $ \{\tilde{\varsigma} \ge t\}$ into the disjoint events $ \{\tilde{\varsigma} = t\}$ and $ \{\tilde{\varsigma} \ge t+1\}$ we have

\begin{align*} & \mathbf{1}_{\{\tilde{\varsigma} \ge t\}}\rho_{t,\tilde{\varsigma}}\big(f_{t},\ldots,f_{\tilde{\varsigma} - 1},\rho_{\tilde{\varsigma},\tau}\big(f_{\tilde{\varsigma}},\ldots,f_{\tau - 1},W_{\tau}\big)\big) = \mathbf{1}_{\{\tilde{\varsigma} = t\}}\rho_{t,\tau}\big(f_{t},\ldots,f_{\tau-1},W_{\tau}\big) \\ & + \mathbf{1}_{\{\tilde{\varsigma} \ge t+1\}}\rho_{t,t+1}\big(f_{t},\rho_{t+1,\tilde{\varsigma}}\big(f_{t+1},\ldots,f_{\tilde{\varsigma} - 1}, \rho_{\tilde{\varsigma},\tau}\big(f_{\tilde{\varsigma}},\ldots,f_{\tau - 1},W_{\tau}\big)\big). \end{align*}

If $ t < T$ and if (A.4) holds with $ t+1$ in place of t, then using conditional translation invariance we get

\begin{align*} {} & \mathbf{1}_{\{\tilde{\varsigma} \ge t\}}\rho_{t,\tilde{\varsigma}}\big(f_{t},\ldots,f_{\tilde{\varsigma} - 1},\rho_{\tilde{\varsigma},\tau}\big(f_{\tilde{\varsigma}},\ldots,f_{\tau - 1},W_{\tau}\big)\big) \\& \quad = \mathbf{1}_{\{\tilde{\varsigma} = t\}}\rho_{t,\tau}\big(f_{t},\ldots,f_{\tau-1},W_{\tau}\big) \\ & \qquad + \mathbf{1}_{\{\tilde{\varsigma} \ge t+1\}}\rho_{t,t+1}\big(f_{t},\rho_{t+1,\tilde{\varsigma}}\big(f_{t+1},\ldots,f_{\tilde{\varsigma} - 1},\rho_{\tilde{\varsigma},\tau}\big(f_{\tilde{\varsigma}},\ldots,f_{\tau - 1},W_{\tau}\big)\big) \\ & \quad = \mathbf{1}_{\{\tilde{\varsigma} = t\}}\rho_{t,\tau}\big(f_{t},\ldots,f_{\tau-1},W_{\tau}\big) \\ & \qquad + \mathbf{1}_{\{\tilde{\varsigma} \ge t+1\}}\rho_{t,t+1}\big(f_{t},\rho_{t+1,\tau}\big(f_{t+1},\ldots,f_{\tau-1},W_{\tau}\big)\big) \\ & \quad = \mathbf{1}_{\{\tilde{\varsigma} \ge t\}}\rho_{t,\tau}\big(f_{t},\ldots,f_{\tau-1},W_{\tau}\big), \end{align*}

and we conclude using backward induction.

Acknowledgements

The authors would like to thank the editor and two anonymous referees for their suggestions, which helped improve the presentation of the paper.

Funding information

This work was partially supported by the EPSRC through grants no. EP/N013492/1 and EP/P002625/1, by the Lloyd’s Register Foundation–Alan Turing Institute programme on data-centric engineering under the LRF grant G0095, and by the Swedish Energy Agency through grants no. 42982-1 and 48405-1.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process for this article.

Data

The code used in Section 5 can be found at https://github.com/moriartyjm/optimalswitching/tree/main/hydro.

References

An, L., Cohen, S. N. and Ji, S. (2013). Reflected backward stochastic difference equations and optimal stopping problems under g-expectation. Preprint. Available at https://arxiv.org/abs/1305.0887.Google Scholar
Bäuerle, N. and JaŚkiewicz, A. (2018). Stochastic optimal growth model with risk sensitive preferences. J. Econom. Theory 173, 181200.CrossRefGoogle Scholar
Carmona, R. and Ludkovski, M. (2010). Valuation of energy storage: an optimal switching approach. Quant. Finance 10, 359374.CrossRefGoogle Scholar
Cheridito, P., Delbaen, F. and Kupper, M. (2006). Dynamic monetary risk measures for bounded discrete-time processes. Electron. J. Prob. 11, 57106.CrossRefGoogle Scholar
Cheridito, P. and Kupper, M. (2011). Composition of time-consistent dynamic monetary risk measures in discrete time. Internat. J. Theoret. Appl. Finance 14, 137162.CrossRefGoogle Scholar
Cohen, S. N. and Elliott, R. J. (2011). Backward stochastic difference equations and nearly time-consistent nonlinear expectations. SIAM J. Control Optimization 49, 125139.CrossRefGoogle Scholar
Detlefsen, K. and Scandolo, G. (2005). Conditional and dynamic convex risk measures. Finance Stoch. 9, 539561.CrossRefGoogle Scholar
Follmer, H. and Schied, A. (2016). Stochastic Finance: an Introduction in Discrete Time, 4th edn. De Gruyter, Berlin.CrossRefGoogle Scholar
Frittelli, M. and Rosazza Gianin, E. (2002). Putting order in risk measures. J. Banking Finance 26, 14731486.CrossRefGoogle Scholar
Kose, U. and Ruszczynski, A. (2020). Risk-averse learning by temporal difference methods. Preprint. Available at https://arxiv.org/abs/2003.00780v1.Google Scholar
Krätschmer, V. and Schoenmakers, J. (2010). Representations for optimal stopping under dynamic monetary utility functionals. SIAM J. Financial Math. 1, 811832.CrossRefGoogle Scholar
Lundström, N. L. P., Olofsson, M. and Önskog, T. (2020). Management strategies for run-of-river hydropower plants—an optimal switching approach. Preprint. Available at https://arxiv.org/abs/2009.10554.Google Scholar
Pflug, G. C. and Pichler, A. (2014). Multistage Stochastic Optimization. Springer, Cham.CrossRefGoogle Scholar
Pichler, A. and Schlotter, R. (2020). Martingale characterizations of risk-averse stochastic optimization problems. Math. Program. 181, 377403.CrossRefGoogle Scholar
Pichler, A. and Shapiro, A. (2019). Risk averse stochastic programming: time consistency and optimal stopping. Preprint. Available at https://arxiv.org/abs/1808.10807.Google Scholar
Rieder, U. (1976). On optimal policies and martingales in dynamic programming. J. Appl. Prob. 13, 507518.CrossRefGoogle Scholar
Ruszczyński, A. (2010). Risk-averse dynamic programming for Markov decision processes. Math. Program. 125, 235261.CrossRefGoogle Scholar
Ruszczyński, A. and Shapiro, A. (2006). Conditional risk mappings. Math. Operat. Res. 31, 544561.CrossRefGoogle Scholar
Shapiro, A., Dentcheva, D. and Ruszczynski, A. (2014). Lectures on Stochastic Programming, 2nd edn. Society for Industrial and Applied Mathematics/Mathematical Optimization Society, Philadelphia.Google Scholar
Shen, Y., Stannat, W. and Obermayer, K. (2013). Risk-sensitive Markov control processes. SIAM J. Control Optimization 51, 36523672.CrossRefGoogle Scholar
Ugurlu, K. (2018). Robust optimal control using conditional risk mappings in infinite horizon. J. Comput. Appl. Math. 344, 275287.CrossRefGoogle Scholar
Figure 0

Figure 1: Architecture of the bid neural network. Nodes $ d_1$ and $ d_2$ represent dense layers with sigmoid activation function.

Figure 1

Figure 2: Architecture of the value neural network. To reduce dimension, in the state vector $ x_t$ the production schedule $ \{P_{t,j}\}_{13\leq j\leq 24}$ is replaced by the sum of its entries. Nodes $ d_3$ and $ d_4$ represent dense layers with sigmoid activation function.

Figure 2

Figure 3: Average production curves by hour for risk sensitivity $ \theta=0, \, 0.01, \, 0.02$ (blue solid, orange dashed, and green dotted lines, respectively).

Figure 3

Figure 4: Plots for risk sensitivity $ \theta=0, 0.01, 0.02$ of the reservoir level $ M_t$ by hour: mean (thick blue solid, orange dashed, and green dotted lines, respectively) and 0.05 percentile (thinner lines).

Figure 4

Figure 5: Predictions made by the $ t=0$ value neural network for $ \theta=0, 0.01, 0.02$ (blue solid, orange dashed, and green dotted lines, respectively) for the input $ x_0=\left[m, 0, \mathbf{0}, \mathbf{0}\right]^{{\small\textsf{T}}}$, for different initial reservoir levels m.