Generalization of signum function to matrices
In mathematics , the matrix sign function is a matrix function on square matrices analogous to the complex sign function .[ 1]
It was introduced by J.D. Roberts in 1971 as a tool for model reduction and for solving Lyapunov and Algebraic Riccati equation in a technical report of Cambridge University , which was later published in a journal in 1980.[ 2] [ 3]
The matrix sign function is a generalization of the complex signum function
csgn
(
z
)
=
{
1
if
R
e
(
z
)
>
0
,
−
1
if
R
e
(
z
)
<
0
,
{\displaystyle \operatorname {csgn} (z)={\begin{cases}1&{\text{if }}\mathrm {Re} (z)>0,\\-1&{\text{if }}\mathrm {Re} (z)<0,\end{cases}}}
to the matrix valued analogue
csgn
(
A
)
{\displaystyle \operatorname {csgn} (A)}
. Although the sign function is not analytic , the matrix function is well defined for all matrices that have no eigenvalue on the imaginary axis , see for example the Jordan-form-based definition (where the derivatives are all zero).
Theorem: Let
A
∈
C
n
×
n
{\displaystyle A\in \mathbb {C} ^{n\times n}}
, then
csgn
(
A
)
2
=
I
{\displaystyle \operatorname {csgn} (A)^{2}=I}
.[ 1]
Theorem: Let
A
∈
C
n
×
n
{\displaystyle A\in \mathbb {C} ^{n\times n}}
, then
csgn
(
A
)
{\displaystyle \operatorname {csgn} (A)}
is diagonalizable and has eigenvalues that are
±
1
{\displaystyle \pm 1}
.[ 1]
Theorem: Let
A
∈
C
n
×
n
{\displaystyle A\in \mathbb {C} ^{n\times n}}
, then
(
I
+
csgn
(
A
)
)
/
2
{\displaystyle (I+\operatorname {csgn} (A))/2}
is a projector onto the invariant subspace associated with the eigenvalues in the right-half plane , and analogously for
(
I
−
csgn
(
A
)
)
/
2
{\displaystyle (I-\operatorname {csgn} (A))/2}
and the left-half plane .[ 1]
Theorem: Let
A
∈
C
n
×
n
{\displaystyle A\in \mathbb {C} ^{n\times n}}
, and
A
=
P
[
J
+
0
0
J
−
]
P
−
1
{\displaystyle A=P{\begin{bmatrix}J_{+}&0\\0&J_{-}\end{bmatrix}}P^{-1}}
be a Jordan decomposition such that
J
+
{\displaystyle J_{+}}
corresponds to eigenvalues with positive real part and
J
−
{\displaystyle J_{-}}
to eigenvalue with negative real part. Then
csgn
(
A
)
=
P
[
I
+
0
0
−
I
−
]
P
−
1
{\displaystyle \operatorname {csgn} (A)=P{\begin{bmatrix}I_{+}&0\\0&-I_{-}\end{bmatrix}}P^{-1}}
, where
I
+
{\displaystyle I_{+}}
and
I
−
{\displaystyle I_{-}}
are identity matrices of sizes corresponding to
J
+
{\displaystyle J_{+}}
and
J
−
{\displaystyle J_{-}}
, respectively.[ 1]
Computational methods [ edit ]
The function can be computed with generic methods for matrix functions , but there are also specialized methods.
The Newton iteration can be derived by observing that
csgn
(
x
)
=
x
2
/
x
{\displaystyle \operatorname {csgn} (x)={\sqrt {x^{2}}}/x}
, which in terms of matrices can be written as
csgn
(
A
)
=
A
−
1
A
2
{\displaystyle \operatorname {csgn} (A)=A^{-1}{\sqrt {A^{2}}}}
, where we use the matrix square root . If we apply the Babylonian method to compute the square root of the matrix
A
2
{\displaystyle A^{2}}
, that is, the iteration
X
k
+
1
=
1
2
(
X
k
+
A
2
X
k
−
1
)
{\textstyle X_{k+1}={\frac {1}{2}}\left(X_{k}+A^{2}X_{k}^{-1}\right)}
, and define the new iterate
Z
k
=
A
−
1
X
k
{\displaystyle Z_{k}=A^{-1}X_{k}}
, we arrive at the iteration
Z
k
+
1
=
1
2
(
Z
k
+
Z
k
−
1
)
{\displaystyle Z_{k+1}={\frac {1}{2}}\left(Z_{k}+Z_{k}^{-1}\right)}
,
where typically
Z
0
=
A
{\displaystyle Z_{0}=A}
. Convergence is global, and locally it is quadratic.[ 1] [ 2]
The Newton iteration uses the explicit inverse of the iterates
Z
k
{\displaystyle Z_{k}}
.
Newton–Schulz iteration[ edit ]
To avoid the need of an explicit inverse used in the Newton iteration, the inverse can be approximated with one step of the Newton iteration for the inverse ,
Z
k
−
1
≈
Z
k
(
2
I
−
Z
k
2
)
{\displaystyle Z_{k}^{-1}\approx Z_{k}\left(2I-Z_{k}^{2}\right)}
, derived by Schulz (de ) in 1933.[ 4] Substituting this approximation into the previous method, the new method becomes
Z
k
+
1
=
1
2
Z
k
(
3
I
−
Z
k
2
)
{\displaystyle Z_{k+1}={\frac {1}{2}}Z_{k}\left(3I-Z_{k}^{2}\right)}
.
Convergence is (still) quadratic, but only local (guaranteed for
‖
I
−
A
2
‖
<
1
{\displaystyle \|I-A^{2}\|<1}
).[ 1]
Solutions of Sylvester equations [ edit ]
Theorem:[ 2] [ 3] Let
A
,
B
,
C
∈
R
n
×
n
{\displaystyle A,B,C\in \mathbb {R} ^{n\times n}}
and assume that
A
{\displaystyle A}
and
B
{\displaystyle B}
are stable , then the unique solution to the Sylvester equation ,
A
X
+
X
B
=
C
{\displaystyle AX+XB=C}
, is given by
X
{\displaystyle X}
such that
[
−
I
2
X
0
I
]
=
csgn
(
[
A
−
C
0
−
B
]
)
.
{\displaystyle {\begin{bmatrix}-I&2X\\0&I\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right).}
Proof sketch: The result follows from the similarity transform
[
A
−
C
0
−
B
]
=
[
I
X
0
I
]
[
A
0
0
−
B
]
[
I
X
0
I
]
−
1
,
{\displaystyle {\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}A&0\\0&-B\end{bmatrix}}{\begin{bmatrix}I&X\\0&I\end{bmatrix}}^{-1},}
since
csgn
(
[
A
−
C
0
−
B
]
)
=
[
I
X
0
I
]
[
I
0
0
−
I
]
[
I
−
X
0
I
]
,
{\displaystyle \operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right)={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}I&0\\0&-I\end{bmatrix}}{\begin{bmatrix}I&-X\\0&I\end{bmatrix}},}
due to the stability of
A
{\displaystyle A}
and
B
{\displaystyle B}
.
The theorem is, naturally, also applicable to the Lyapunov equation . However, due to the structure the Newton iteration simplifies to only involving inverses of
A
{\displaystyle A}
and
A
T
{\displaystyle A^{T}}
.
Solutions of algebraic Riccati equations [ edit ]
There is a similar result applicable to the algebraic Riccati equation ,
A
H
P
+
P
A
−
P
F
P
+
Q
=
0
{\displaystyle A^{H}P+PA-PFP+Q=0}
.[ 1] [ 2] Define
V
,
W
∈
C
2
n
×
n
{\displaystyle V,W\in \mathbb {C} ^{2n\times n}}
as
[
V
W
]
=
csgn
(
[
A
H
Q
F
−
A
]
)
−
[
I
0
0
I
]
.
{\displaystyle {\begin{bmatrix}V&W\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}.}
Under the assumption that
F
,
Q
∈
C
n
×
n
{\displaystyle F,Q\in \mathbb {C} ^{n\times n}}
are Hermitian and there exists a unique stabilizing solution, in the sense that
A
−
F
P
{\displaystyle A-FP}
is stable , that solution is given by the over-determined , but consistent , linear system
V
P
=
−
W
.
{\displaystyle VP=-W.}
Proof sketch: The similarity transform
[
A
H
Q
F
−
A
]
=
[
P
−
I
I
0
]
[
−
(
A
−
F
P
)
−
F
0
(
A
−
F
P
)
]
[
P
−
I
I
0
]
−
1
,
{\displaystyle {\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}={\begin{bmatrix}P&-I\\I&0\end{bmatrix}}{\begin{bmatrix}-(A-FP)&-F\\0&(A-FP)\end{bmatrix}}{\begin{bmatrix}P&-I\\I&0\end{bmatrix}}^{-1},}
and the stability of
A
−
F
P
{\displaystyle A-FP}
implies that
(
csgn
(
[
A
H
Q
F
−
A
]
)
−
[
I
0
0
I
]
)
[
X
−
I
I
0
]
=
[
X
−
I
I
0
]
[
0
Y
0
−
2
I
]
,
{\displaystyle \left(\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}\right){\begin{bmatrix}X&-I\\I&0\end{bmatrix}}={\begin{bmatrix}X&-I\\I&0\end{bmatrix}}{\begin{bmatrix}0&Y\\0&-2I\end{bmatrix}},}
for some matrix
Y
∈
C
n
×
n
{\displaystyle Y\in \mathbb {C} ^{n\times n}}
.
Computations of matrix square-root [ edit ]
The Denman–Beavers iteration for the square root of a matrix can be derived from the Newton iteration for the matrix sign function by noticing that
A
−
P
I
P
=
0
{\displaystyle A-PIP=0}
is a degenerate algebraic Riccati equation [ 3] and by definition a solution
P
{\displaystyle P}
is the square root of
A
{\displaystyle A}
.
^ a b c d e f g h Higham, Nicholas J. (2008). Functions of matrices : theory and computation . Society for Industrial and Applied Mathematics. Philadelphia, Pa.: Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104). ISBN 978-0-89871-777-8 . OCLC 693957820 .
^ a b c d Roberts, J. D. (October 1980). "Linear model reduction and solution of the algebraic Riccati equation by use of the sign function" . International Journal of Control . 32 (4): 677– 687. doi :10.1080/00207178008922881 . ISSN 0020-7179 .
^ a b c Denman, Eugene D.; Beavers, Alex N. (1976). "The matrix sign function and computations in systems" . Applied Mathematics and Computation . 2 (1): 63– 94. doi :10.1016/0096-3003(76)90020-5 . ISSN 0096-3003 .
^ Schulz, Günther (1933). "Iterative Berechung der reziproken Matrix" . ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik . 13 (1): 57– 59. Bibcode :1933ZaMM...13...57S . doi :10.1002/zamm.19330130111 . ISSN 1521-4001 .