I. INTRODUCTION
Alternatives to the classical Cartesian transmitter that uses linear power amplifiers (PAs) with constant supply are being investigated to overcome the poor power efficiency with high peak-to-average power ratio (PAPR) signals. The Doherty architecture, for example, has been adopted for base stations, where several manufacturers (e.g. Freescale, NXP), are offering PAs with an average efficiency up to 50% and even more [Reference Kim, Kim and Moon1]. However, other promising structures such as the envelope elimination and restoration (EE&R) [Reference Raab, Sigmon, Myers and Jackson2, Reference Wang3], the envelope tracking (ET), or polar transmitters with delta-sigma modulation [Reference Taromaru, Ando, Kodera and Yano4] are still being considered as candidates to overcome the Doherty PA efficiency. From the implementation point of view, ET is a very attractive technique because it can be applied in conventional transmitters based on linear RF amplification topologies by simply substituting the classical static supply for a dynamic one.
One of the main constraints in the maximum efficiency that can be achieved by ET transmitters regards the envelope modulator of the envelope amplifier (EA), since the overall efficiency of an ET architecture is the product between both the PA and the EA power efficiency. The envelope bandwidth (BW) is several times (theoretically is infinite) the BW of the baseband complex modulated signal, which is critical when considering current wideband signals with high PAPR. There are already some companies, such as Nujira (www.nujira.com), MaXentric (www.maxentric.com) or Quantance (www.quantance.com) that are offering ET solutions with average efficiencies above 60% for WCDMA and LTE signals.
One of the main challenges of the EA consists of supplying the power required by the transistor at the same speed of the signal's envelope. In dual-band applications, for example, this becomes even more challenging since the combined envelope can present BWs more than 5 × the carrier separation. Therefore, in order to relax the EA requirements, some solutions have been proposed to reduce the BW and slew-rate (SR) of the original signal's envelope [Reference Jeong, Kimball, Kwak, Hsia, Draxler and Asbeck5–Reference Montoro, Gilabert, Bertran and Berenguer8]. Unfortunately, the use of a slower version of the envelope to supply the PA drain not only degrades the overall efficiency but also results in nonlinear distortion amplification. Despite the efficiency and linearity degradation, the solution of supplying the PA with a slower envelope can still be of interest in applications where it is necessary to trade-off the BW and efficiency due to the EA limitations. To compensate the nonlinear distortion that arises when using the SR's limited version of the original envelope, it will be necessary to use a slow envelope-dependent digital predistorter (SED-DPD) [Reference Jeong, Kimball, Kwak, Hsia, Draxler and Asbeck5, Reference Gilabert and Montoro9, Reference Montoro, Gilabert, Berenguer and Bertran10].
Therefore, this paper is organized as follows. The BW versus efficiency trade-off in EAs will be discussed in Section II. The design of the DPD that is required to compensate for the nonlinear distortion that arises when supplying with a slower version of the signal's envelope, will be presented in Section III. Some field programmable gate array (FPGA)-oriented implementation architectures for real-time DPD will be discussed in Section IV. Finally, in Section V conclusions will be given.
II. DYNAMIC SUPPLY OF THE PA WITH SLOW VERSIONS OF THE SIGNAL'S ENVELOPE
In an ET system (see Fig. 1), the supply voltage is dynamically adjusted to track the RF envelope at high instantaneous power. The supply voltage can be shaped according to different criteria. By means of a so called shaping function it is possible to accommodate the shape of the supply voltage (that somehow must follow the instantaneous RF envelope) to achieve the following objectives: optimum efficiency, isogain [Reference Wimpenny11–Reference Hoversten, Schafer, Roberg, Norris, Maksimovic and Popovic13] or SR and BW reduced shaping [Reference Vizarreta, Montoro and Gilabert14].

Fig. 1. General block diagram of an ET PA with DPD.
Focusing on this later objective, two different approaches based on SR and BW reduction of the RF signal's envelope showed that these strategies are suitable to adapt the envelope characteristics to the EA requirements or limitations at the expenses of having efficiency degradation. On the one hand, the method proposed in [Reference Jeong, Kimball, Kwak, Hsia, Draxler and Asbeck5, Reference Mustafa, Bassoo and Faulkner6] limits the BW of the envelope iteratively, which may represent an issue in real time applications. On the other hand, the method proposed in [Reference Montoro, Gilabert, Bertran and Berenguer8] consists of a real-time algorithm where the resulting signal is limited in SR but not in BW, making challenging its amplification if only a switched mode EA is considered or requiring a wide band if only a linear EA is considered. Therefore, in [Reference Vizarreta, Montoro and Gilabert14], the SR reduction algorithm proposed in [Reference Montoro, Gilabert, Bertran and Berenguer8] was modified in order to also restrict the BW of the resulting slow envelope. Moreover, due to its simplicity this algorithm is suitable to be implemented in a digital signal processor. Fig. 2 shows the original RF signal's envelope, an SR reduced version of the original envelope (SR reduced envelope – SRRE) and a BW reduced version of the original envelope (BW reduced envelope – BWRE) in both time and frequency domains, respectively. The parameter N (defined in [Reference Montoro, Gilabert, Bertran and Berenguer8]) is related to the maximum allowed increment in the signal's slope. For example, N = 100 corresponds to an SR reduction of 96% and BW reduction of 64% with respect to the original signal's envelope. The results shown in Fig. 2 were extracted from the implementation of this algorithm on a FPGA Virtex-4 whose clock speed was set to 60 MHz.

Fig. 2. Waveforms and spectra of the envelope and its SR and BW limited versions [14].
As reported in [Reference Gilabert, Montoro and Vizarreta15], the efficiency decays more or less linearly with the BW reduction, while it presents a logarithmic behavior with the SR reduction. As a consequence, when considering applications with high BW signals (e.g. dual-band transmissions) it is possible to find a trade-off solution to meet both SR and BW requirements of the EA while still keeping a reasonably good drain efficiency figure.
Unfortunately, using the SR and BW limited envelope (or simply slow envelope – E s) to supply the power transistor's drain generates a particular nonlinear distortion. Fig. 3 shows the AM–AM characteristics considering different margins of E s values. As observed in Fig. 3, the ET PA shows a nonlinear variant gain because the slow envelopes used to supply the PA and the RF input signal are not univocally related. Therefore, for a given input it is possible to have a range of different outputs because it depends on the specific value of the dynamic power supply. Therefore, the ET PA presents a SED nonlinear behavior.

Fig. 3. AM–AM characteristics of the PA when considering only three margins of E s (left) and taking into account all possible values of E s (right).
III. DESIGN OF A REAL-TIME DPD FOR ET
The type of low-pass equivalent black-box behavioral model required to characterize the nonlinear distortion that arises when applying ET is dependent on the strategy (or shaping function) followed to supply the PA. Therefore, on the one hand, if the PA drain voltage follows the same shape (despite being bounded at low-voltage levels) than the RF signal's envelope, typical behavioral models such as the memory polynomial (MP) [Reference Kim and Konstantinou7] can be used for DPD purposes. On the other hand, if the slow envelope is used to supply the PA, then the DPD has to include the information of the slow envelope in order to be capable of compensating for this type of nonlinear distortion.
For the case of using the original envelope, we can consider the implementation of a DPD based on the simple MP model. Following the notation in Fig. 1, the input–output relationship of the MP DPD is defined as

where nonlinear functions f i(·) can be described by polynomials of order P

As previously explained, when considering the slow envelope to supply the PA, the nonlinear distortion that appears cannot be compensated by simply using dynamic behavioral models such as the MP [Reference Montoro, Gilabert, Berenguer and Bertran10]. Therefore, in [Reference Gilabert and Montoro9] a dynamic SED behavioral model is proposed to compensate for this type of nonlinear distortion. The input–output relationship of the SED-DPD is defined as

where E s[n] is the SR-limited version of the original envelope, u[n] is the input signal, τj and τi (with τ0 = 0) are the most significant tap delays of the slow envelope and input signal, respectively, contributing to the characterization of memory effects.
Figure 4 shows linearized and unlinearized AM–AM characteristics of an ET PA when supplying the PA with the original envelope (MP DPD used) and with a slower version of the original envelope (SED-DPD used). The linearity performance in terms of out-of-band distortion compensation of the SED-DPD can be observed in Fig. 5. These particular results were measured on a test-bed based on instrumentation, schematically depicted in Fig. 1 and described in [Reference Montoro, Gilabert, Berenguer and Bertran10]. The Device under test (DUT) is a Cree Inc. Evaluation Board CGH40006P-TB (GaN transistor) at 2 GHz operating at a mean output power of 28 dBm. For the sake of simplicity, a linear IC LT1210 was considered as the envelope driver. The PAPR of the signals at baseband range from around 8 up to 11 dB, depending on the type of signal used (single-carrier M-QAM or OFDM). In the case of the SED-DPD, we used the following configuration: P = 9, Q = 2, M = 3 and N = 1 (alternatively, N = 0).

Fig. 4. Linearized and unlinearized AM-AM characteristics of an ET PA considering: (a) the original envelope (left), (b) a slow envelope (right).

Fig. 5. Unlinearized and linearized (dynamic SED-DPD) output power spectra of a single-carrier 16-QAM (left) and OFDM 16-QAM (right) signals, respectively.
IV. FPGA IMPLEMENTATION ARCHITECTURES
The FPGA implementation of an MP DPD will follow the structure presented in Fig. 6. Each branch represents one nonlinear function expressed by means of a polynomial development. To allow an accurate and efficient FPGA implementation of the MP DPD it is important to minimize the number of arithmetic operations (counting both additions and multiplications) and minimize the accumulative error inside the FPGA. Both issues can be addressed using the Horner's rule and this way limiting the number of consecutive complex multiplications to a maximum of two. Moreover, as presented in [Reference Mrabet, Mohammad, Mkadem, Rebai and Boumaiza16], in order to avoid a large variation in magnitude of the polynomial coefficients (which requires a large number of bits to preserve the precision of the computation) it is possible to take the ratios of adjacent coefficients. As a consequence, with a reformulation of (2) according to Horner's rule, nonlinear functions f i(·) can be described as


Fig. 6. Block diagram of the MP DPD (left) and the SED-DPD (right).
Therefore, taking into account the polynomial expression in (2), where $\gamma _{pi} \in {\open C}$, it takes p + 1 real multiplications for each monomial
$\gamma _{pi} \left\vert {u [n - \tau _i^{} ] } \right\vert \, ^p $ and 2P additions (P complex additions), resulting in P(P+7)/2 arithmetic operations for a polynomial of degree P. While using the formulation in (4), computation starts with the innermost parentheses using the coefficients of the highest degree monomials and works outward, each time multiplying the previous result by
$\left\vert {u [n - \tau _i] } \right\vert $ and adding the coefficient of the monomial of the next lower degree. Now it takes 4P arithmetic operations for a polynomial degree of P, which for high polynomial orders, Horner's algorithm results much more computationally efficient. Figure 7 shows the structure of the nonlinear branches of the MP DPD in Fig. 6. Alternatively, instead of using polynomials to describe nonlinear functions f i(·) it would have been possible to use basic predistortion cells (BPCs) [Reference Gilabert, Montoro and Bertran17]. A BPC is composed of a RAM block acting as a look-up table (LUT), an address calculator and complex multipliers.

Fig. 7. Structure of one of the branches of the MP DPD (see Fig. 6) using Horner's rule.
In order to implement the dynamic SED-DPD in an FPGA device, the polynomial model in (3) is expressed as a combination of several BPCs [Reference Gilabert and Montoro9]:

which yields to the following expression of the SED-DPD:

with G LUTiqj being complex LUT gains.
Figure 6 shows the general block diagram of the SED-DPD architecture, where nonlinear functions f iqj (·) can be expressed as a combination of BPCs. The number of BPCs forming this SED-DPD is # BPCs = (Q + 1)(N + 1)(M + 1). This structure requires less arithmetic operations than using polynomials; however, it consumes more memory resources.
Figure 8 shows the basic structure of a BPC where a dual-port RAM, with two independent sets of ports for simultaneous reading and writing, is used to allow the complex LUT gains to be updated continuously without interrupting the normal data transmission. Therefore, because of this LUT-based architecture, it is possible to perform continuous adaptation of the DPD function by means of the least-mean squares (LMS) algorithm [Reference Gilabert, Montoro and Bertran17].

Fig. 8. Basic architecture of a BPC forming the SED-DPD (see Fig. 6).
V. CONCLUSION
In this paper, we have presented and discussed two computationally efficient design strategies for implementing real-time DPD in a FPGA device when considering ET PAs. As discussed along the paper, when considering slow versions of the original envelope to perform ET, the nonlinear distortion that appears has to be compensated using DPD architectures that depend not only on the input data and its memory, but also on the drain voltage signal (slow envelope) and its memory. Two efficient architectures to allow real-time FPGA implementation of the DPD function have been presented. One solution is based on polynomials and the other one is based on LUTs. The trade-off between those two configurations is the number of arithmetic operations versus the memory resources requirements. In any case, the linearization performance of both architectures has been validated in several papers [Reference Gilabert and Montoro9, Reference Mrabet, Mohammad, Mkadem, Rebai and Boumaiza16]. Finally, another key issue toward the computationally efficient FPGA implementation is the design of identification/adaptation process. One possibility is the use of LMS-based solutions as in [Reference Gilabert, Montoro and Bertran17], where the coefficients (or complex LUT gains) are being continuously updated. Alternatively, if more complex least-squares-type algorithms are considered, the coefficient update procedure can be relocated to embedded software running on a microblaze soft processor core as in [Reference Julius and Dinh18].
ACKNOWLEDGEMENT
This work was supported by the Spanish Government (MINECO) under project TEC2011-29126-C03-02.
Pere L. Gilabert received the degree in Telecommunication Engineering from UPC in 2002, and he developed his Master Thesis at the University of Rome “La Sapienza” with an exchange grant. He joined the department of TSC in 2003 and received his Ph.D., awarded with the Extraordinary Doctoral Prize, from the UPC in 2008. He is an associate professor at UPC where his research activity is in the field of linearization techniques and highly efficient transmitter architectures.
Gabriel Montoro received the M.S. degree in Telecommunication Engineering in 1990 and his Ph.D. degree in 1996, both from UPC. He joined the department of TSC in 1991, where he is currently an associate professor. His first research works were done on the area of adaptive control, and now his main research interest is in the use of signal processing strategies for efficiency improvement in communications systems.