Auther = 'Yubo Cai'
Email = 'yubo.cai@polytechnique.edu'
import sympy as sp
import numpy as np
from sympy import symbols, Eq, solve
import matplotlib.pyplot as plt
import scipy as sc
from scipy.optimize import minimize
from scipy.optimize import minimize_scalar
from scipy.optimize import linprog
Question 1
In a research article you see the following figure. The author says that the downward pointing spikes correspond to solutions $\left\{x_i\right\}$ of some problem of interest and that they are found using the Golden Search method.
A list of the solutions $\left\{x_i\right\}$ is given and the author claims they are precise up to 12 significative digits. Knowing that the computations were made using a machine precision which can give 16 significative digits, do you believe this claim? Justify your answer.
Answer to Question 1 |
No, I believe this claim is not possible with 12 significative digits. In the textbook page 17 Remark 1.2, we know that when we use the zero-order methods in order to bracket the minimum, we have the precision is $\sqrt{\varepsilon}$, where $\varepsilon$ is the achine precision. We have the proof as follows:
We take the second order Taylor expansion around the minimum, we have: $$ f(x) \approx f\left(x^*\right)+\frac{1}{2} f^{\prime \prime}\left(x^*\right)\left(x-x^*\right)^2 $$ Suppose that $\varepsilon$ is the machine precision (typically around $10^{-16}$ ). Then if $x_2<\varepsilon x_1$ then $x_2$ is seen as zero when compared to $x_1$. In particular $x_1+x_2$ will numerically be equal to $x_1$.
Coming back to our Taylor expansion, if $\frac{1}{2}\left|f^{\prime \prime}\left(x^*\right)\right|\left(x-x^*\right)^2<\varepsilon\left|f\left(x^*\right)\right|$ where $\varepsilon$ is the machine epsilon then numerically we don't see any difference between $f(x)$ and $f\left(x^*\right)$ In conclusion, the algorithm will not be able to tell the difference between $f(x)$ and $f\left(x^*\right)$ if $$ \left|x-x^*\right| \leq \sqrt{\varepsilon}\left|x^*\right| \sqrt{\frac{2\left|f\left(x^*\right)\right|}{\left(x^*\right)^2\left|f^{\prime \prime}\left(x^*\right)\right|}} $$ This implies that the precision of the zero-order methods is $\sqrt{\varepsilon}$.
Therefore, the theoritical precision in this case is $\sqrt{\varepsilon} \approx 10^{-8}$, which is not enough to claim that the solutions are precise up to 12 significative digits.
Question 2
Suppose $f: \mathbb{R} \rightarrow \mathbb{R}$ is a strictly convex, unimodal function with global minimum $x^*$. Does Newton's algorithm $$ x_{i+1}=x_i-f^{\prime}\left(x_i\right) / f^{\prime \prime}\left(x_i\right) $$ converge to $x^*$ regardless of the initialization? What about the Gradient Descent (GD) algorithm with Wolfe line-search? (no proofs, just references or examples/counter-examples)
x = sp.Symbol('x')
f_x = sp.sqrt(1 + x**2)
# Calculate the first and second derivatives of f(x)
f_prime = f_x.diff(x)
f_double_prime = f_prime.diff(x)
display(sp.simplify(f_prime))
display(sp.simplify(f_double_prime))
# Compute the expression x_{i+1} = x_i - f'(x_i) / f''(x_i)
x_next = x - f_prime / f_double_prime
display(sp.simplify(x_next))
Answer to Question 2 |
For Newton's algorithm, the method converge or not depends on the initialization. Here we have a typical counter-example to show that the method does not converge for all initializations.
Consider function $f(x)=\sqrt{1+x^{2}}$. Here we know the minimizer is $x^* = 0$. Then by newton's method, we have: $$ \begin{aligned} x_{i+1}&=x_i-f^{\prime}\left(x_i\right) / f^{\prime \prime}\left(x_i\right) \\ &= -x_{i}^{3} \\ \end{aligned} $$ Therefore, if $\left|x_0\right| \geq 1$ then the Newton algorithm does not converge. (Like $x_{0}=2, x_{1}=-8, \cdots, x_{i+1}=-x_{i}^3$)
For Gradient Descent (GD) algorithm with Wolfe line-search, the method converge for all initializations. Here we use the reference from the textbook page 27. We denote that $q(t)=f(x+td)$, where $d$ is the descent direction. Then we have: $$ q^{\prime}(0)=f^{\prime}(x) d<0 $$ by the definition of descent direction. Then we apply Propersition 1.15 of wolfe line-search from the textbook page 29, Since function $f$ is strictly convex and unimodal, we have q is bounded below and $q^{\prime}(0)=f^{\prime}(x) d<0$, then the linear search with the Wolfe rule finishes in a finite number of steps.
Question 3
a) What is the main drawback when using the Gradient Descent algorithm in higher dimensions? Give an example of a strictly convex function for which GD will take a long time to converge.
b) Suppose you have a function $f: \mathbb{R}^2 \rightarrow \mathbb{R}$ which is smooth with a unique global minimum $x^*$ and that the gradient evaluated at the point $x_0$ is $\nabla f\left(x_0\right)=\left(10^6, 10^{-6}\right)$. Is this function ill-conditioned? Do you expect the Gradient Descent algorithm to work well for this function?
c) Consider the quadratic function $f(x)=x^T A x-b^T x$ for a $2 \times 2$ matrix $A$ with eigenvalues $\lambda_1=$ $0.01, \lambda_2=10^6$. What is the condition number of the matrix $A$ ? Will the GD algorithm converge towards the unique minimum of $f$ regardless of the initialization? What about the speed of convergence?
Answer to Question 3 |
a). The main drawback when using the Gradient Descent algorithm in higher dimensions is the convergence rate can be slow. The convergence speed of the GD is mainly depends on the choice of $t$. Also, the Gradient Descent algorithm might get stuck in local minima rather than global minima.
In the strongly convex case we know that the rate of convergence is linear. The speed of convergence is dictated by the condition number of f : in cases where this condition number is large, the GD algorithm may converge really slowly. For example, in quadratic ill-conditioned problem, we have $$ f(x)=x^T A x, A=\left(\begin{array}{cc} 0.1 & 0 \\ 0 & 2000 \end{array}\right), x_0=(-0.5,1.5), Q=20000 $$ where $Q$ is the condition number of $A$. Here we the with the fixed step algorithms, it took 10000 iterations to get close with the optimal solution like the graph of following:
b). Here our function is $f: \mathbb{R}^2 \rightarrow \mathbb{R}$, according to the definition of Jacobian matrix, we have: $$ J(f(x_0)) = \nabla f\left(x_0\right)=\left(10^6, 10^{-6}\right) $$ Then we have the condition number of the function $f$ is (reference from here): $$ \operatorname{cond}(f, x_0)=\frac{\|x_0\|\|J(x_0)\|}{\|f(x_0)\|} = \frac{\|x_0\|\|\nabla f\left(x_0\right)\|}{\|f(x_0)\|} $$ Here, since we have $\nabla f\left(x_0\right)=\left(10^6, 10^{-6}\right)$, the condition number of the function $f$ at $x_0$ is extremely large since $\|\nabla f\left(x_0\right)\|$ is in the order of $10^6$. Therefore, the function $f$ is ill-conditioned and the Gradient Descent algorithm will not converge really fast since the condition number is large.
c). Here we recall the Definition 2.8 of condition number of a matrix $A$ in page 47 of the textbook: $$ \boxed{\text{Let $A$ be a symmetric positive-defintie matrix. If $0<\lambda_{\min }<\lambda_{\max }$ its smallest and largest eigenvalues the number $Q=\lambda_{\max } / \lambda_{\min }$ is called the condition number of $A$}} $$ Since $f$ is a quadratic function, we know that $A$ is a symmetric positive-defintie matrix. Then we have the condition number of the matrix $A$ is: $$ Q=\lambda_{\max } / \lambda_{\min } = \frac{10^6}{0.01} = 10^8 $$ Therefore, since the condition number is extremely large, the GD algorithm will not converge towards the unique minimum of $f$ regardless of the initialization, which means it depends on the initialization and the linear search step size.
Then we compute the Convergence ratio of $f$, we have the Steepest Descent algorithm applied to a strongly convex quadratic form $f$ with condition number $Q$ converges linearly with the convergence ratio at most $$ 1-\frac{4 Q}{(1+Q)^2}=\left(\frac{Q-1}{Q+1}\right)^2 $$ Since we have $Q=10^8$, the convergence ratio is really close to 1, which means the convergence speed is really slow. We can consider using preconditioning techniques or more advanced optimization algorithms, such as Conjugate Gradient, Newton's method, or quasi-Newton methods like BFGS or L-BFGS. These algorithms often have better convergence properties for ill-conditioned problems.
Question 4
You have to optimize a function $f: \mathbb{R}^{100000} \rightarrow \mathbb{R}$. The function is explicit and you are able to compute the Hessian matrix. You observe that the Hessian matrix is full. Can you use Newton's method in order to efficiently minimize the function $f$ ? What if the Hessian matrix is tridiagonal? Justify your answers.
Answer to Question 4 |
From the definition of Hessian matrix, we got $$ H(f)=\left[\begin{array}{cccc} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_\mu \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{array}\right] $$ Therefore, the Hessian matrix is a $100000 \times 100000$ matrix. It's extremely large for the RAM of a computer to store the Hessian matrix. Therefore, we cannot use Newton's method to efficiently minimize the function $f$. Since the Hessian matrix is full, when we aplly the Newton's method we will have matrix multiplication and matrix inversion, which will be extremely slow when the dimension is extremely large.
The case if the Hessian matrix is tridiagonal, we have the general definition of Newton method that: Given a starting point $x_0$ run the recurrence $$ x_{i+1}=x_i-\left[D^2 f\left(x_i\right)\right]^{-1} \nabla f\left(x_i\right) \text {. } $$ Then we main problem is how we compute $\left[D^2 f\left(x_i\right)\right]^{-1} \nabla f\left(x_i\right)=(H(f))^{-1}\nabla f\left(x_i\right)$. We can transfer this problem into solving a linear system: $$ (H(f))^{-1}\nabla f\left(x_i\right) = x \Rightarrow H(f)x = \nabla f\left(x_i\right) \Rightarrow Ax=b $$ Since the Hessian matrix is symmetric and tridiagonal, we can use the Cholesky decomposition from MAA208 - Numerial Linear Algebra to solve such a problem. The detailed algorithm can be found in this link. In this case, we can solve the problem with complexity $O(n^{2})$. Although the computation is still really heavy but much better than the case when the Hessian matrix is full.
Question 5
a) You use the Conjugate Gradient (CG) algorithm to solve the system $A x=b$ where $A$ is a $10^6 \times 10^6$ positive definite tridiagonal matrix (only the main diagonal and its neighbors are non-zero). What is the complexity of the CG algorithm in this case?
b) Suppose, moreover, that $A$ has exactly 100 distinct eigenvalues. According to the theoretical results shown in the course, what is the required number of iterations for $\mathrm{CG}$ to converge?
Answer to Question 5 |
Given a symmetric, positive-definite matrix $A$, solving the system $A x=b$ is equivalent to minimizing the quadratic function $$ f: x \mapsto \frac{1}{2} x^T A x-b \cdot x . $$ since the gradient of this quadratic function is $\nabla f(x)=A x-b$. Then we have the algorithm of Conjugate Gradient (CG) in solving the system $A x=b$: Let $A$ be a symmetric positive-definite matrix and $d_1, \ldots, d_n$ a (complete) system of $n$ non-zero $A$-orthogonal vectors. Then the solution $x^*$ to the system $A x=b$ is given by the formula $$ x^*=\sum_{j=1}^n \frac{b^T d_j}{d_j^T A d_j} d_j $$ Since we have A is a $10^6 \times 10^6$ positive definite tridiagonal matrix, in this case, we have a $O(n^{2})$ computational complexity.
If A has exactly 100 distinct eigenvalues, which means we have 100 orthogonal eigenvectors. Then we have the required number of 100 iterations for CG to converge.
Question 6
Consider the problem $$ \min _{g_i(x, y, z)=1} f(x, y, z) $$ with $f(x, y, z)=x^2+y^2+z^2, g_1(x, y, z)=2 x+3 y-z, g_2(x, y, z)=5 x-y+z$. Does this problem have a solution? Is it possible to write the optimality conditions given by the Lagrange multipliers?
Answer to Question 6 |
Here, we have the two constraints $g_1(x, y, z)=2 x+3 y-z$ and $g_2(x, y, z)=5 x-y+z$. We have the gradient of the constraints are: $$ \nabla g_1(x, y, z) = (\frac{\partial g_{1}}{\partial x}, \frac{\partial g_{1}}{\partial y}, \frac{\partial g_{1}}{\partial z}) = \left(2, 3, -1\right) \\ \nabla g_2(x, y, z) = (\frac{\partial g_{2}}{\partial x}, \frac{\partial g_{2}}{\partial y}, \frac{\partial g_{2}}{\partial z}) = \left(5, -1, 1\right) $$ Since the gradients of the constraints are linearly independent, the problem indeed has a solution and we can apply the Lagrange multiplier method. assume we have the Lagrange multiplier $\lambda_1$ and $\lambda_2$, then we have the Lagrangian function: $$ \begin{aligned} &L(x, y, z, \lambda_1, \lambda_2) \\ &= f(x, y, z) + \lambda_1 g_1(x, y, z) + \lambda_2 g_2(x, y, z) \\ &= x^2+y^2+z^2 + \lambda_1 (2 x+3 y-z) + \lambda_2 (5 x-y+z) \end{aligned} $$ Then we have the system of equations of the optimality conditions: $$ \begin{cases} \nabla f(\mathbf{x})+\lambda_1 \nabla g_1(\mathbf{x})+\lambda_2 \nabla g_2(\mathbf{x}) =\mathbf{0}, \\ g_1(\mathbf{x}) =1 \\ g_2(\mathbf{x}) =1 . \end{cases} \Rightarrow \begin{cases} (2x, 2y, 2z) + \lambda_1 (2, 3, -1) + \lambda_2 (5, -1, 1) = (0, 0, 0) \\ 2x+3y-z = 1 \\ 5x-y+z = 1 \end{cases} $$ Then we got the system of equations: $$ \begin{cases} 2x + 2\lambda_1 + 5\lambda_2 = 0 \\ 2y + 3\lambda_1 - \lambda_2 = 0 \\ 2z - \lambda_1 + \lambda_2 = 0 \\ 2x+3y-z = 1 \\ 5x-y+z = 1 \end{cases} $$ Here, we got the solution that $\lambda_1 = -\frac{7}{57}$, $\lambda_2 = -\frac{8}{171}$, and the optimal point is $(x, y, z) = (\frac{41}{171}, \frac{55}{342}, -\frac{13}{342})$. Then, it is possible to write the optimality conditions given by the Lagrange multipliers.
x, y, z, lambda_1, lambda_2 = symbols('x y z lambda_1 lambda_2')
eq1 = Eq(2*x + 2*lambda_1 + 5*lambda_2, 0)
eq2 = Eq(2*y + 3*lambda_1 - lambda_2, 0)
eq3 = Eq(2*z - lambda_1 + lambda_2, 0)
eq4 = Eq(2*x + 3*y - z, 1)
eq5 = Eq(5*x - y + z, 1)
solutions = solve((eq1, eq2, eq3, eq4, eq5), (x, y, z, lambda_1, lambda_2))
print('lambda_1 =', solutions[lambda_1], ', lambda_2 =', solutions[lambda_2], 'Optimal point:', (solutions[x], solutions[y], solutions[z]))
lambda_1 = -7/57 , lambda_2 = -8/171 Optimal point: (41/171, 55/342, -13/342)
Question 7
Consider the problem $$ \min _{g_i(x) \leq 0} x^2+y^2, $$ with $g_1(x)=x+y-1$ and $g_2(x)=y-x-1$. What is the optimal solution to this problem? Can the optimality conditions be written at the optimum? What is the corresponding optimality condition for this problem?
Answer to Question 7 |
Since the objective function is $x^2+y^2$, we can see that the optimal solution is $(x, y) = (0, 0)$ which satisfy our constraints. Therefore, the optimality conditions can be written at the optimum. We know the optimal solution is $x^{*}=(x, y) = (0, 0)$ and $g_{1}(x^{*}) \leq 0$ and $g_{2}(x^{*}) \leq 0$. Therefore, the Lagrangian function is: $$ L(x, y, \lambda_1, \lambda_2) = x^2+y^2 + \lambda_1 (x+y-1) + \lambda_2 (y-x-1) $$ Then we have the system of equations of the optimality conditions: $$ \begin{cases} \nabla f(\mathbf{x})+\lambda_1 \nabla g_1(\mathbf{x})+\lambda_2 \nabla g_2(\mathbf{x}) & =\mathbf{0}, \\ g_1(\mathbf{x}) & = 0 \\ g_2(\mathbf{x}) & = 0 . \end{cases} \Rightarrow \begin{cases} (2x, 2y) + \lambda_1 (1, 1) + \lambda_2 (-1, 1) = (0, 0) \\ x+y-1 = 0 \\ y-x-1 = 0 \end{cases} \Rightarrow \begin{cases} x^{*} = 0 \\ y^{*} = 0 \end{cases} $$
def f(x, y):
return x**2 + y**2
plt.figure(figsize=(6, 6))
x = np.linspace(-2, 2, 400)
y = np.linspace(-2, 2, 400)
x, y = np.meshgrid(x, y)
z = f(x, y)
plt.contour(x, y, z, levels=20)
x_boundary = np.linspace(-2, 2, 400)
y_boundary1 = 1 - x_boundary
y_boundary2 = x_boundary + 1
# change the figure size
plt.plot(x_boundary, y_boundary1, label='x + y - 1 = 0')
plt.plot(x_boundary, y_boundary2, label='y - x - 1 = 0')
plt.fill_between(x_boundary, np.minimum(y_boundary1, y_boundary2), -2, where=(x_boundary >= -2) & (x_boundary <= 2), color='gray', alpha=0.5)
plt.xlim(-2, 2)
plt.ylim(-2, 2)
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Question 8
Let $f: \mathbb{R}^N \rightarrow \mathbb{R}$ be a function you want to minimize. Indicate which optimization algorithm presented in the course you would use to search for the minimum and justify your choice.
(i) $N=1000$, you know the gradient and the Hessian of $f$. You observe that the Hessian matrix is full.
(ii) $N=10^6$, you know the gradient, the function is ill-conditioned.
(iii) $N=3$, you only have access to function values.
(iv) $N=1000$, the function $f$ is represented as a sum of squares of $C^1$ functions.
(v) You minimize a $C^1$ function (gradients are available) with some linear constraints on the variables.
(vi) You minimize a convex $C^1$ function (gradients are available) with some non-linear convex equality constraints on the variables.
Answer to Question 7 |
(i) L-BFGS algorithm. Here is $N=1000$ which is large and the matrix is full, then apply the Newton method requires large amount of computation. The L-BFGS algorithm approximates the inverse Hessian matrix using only a limited amount of memory which is suitable for large $N$.
(ii) Conjugate Gradient Method. Here we have a really large dimension of the funtion $f$, and we know the function is ill-conditioned. Therefore, we may choose the conjugate gradient method to solve this problem since it's a iterative algorithms only try to produce an approximation of the solution, which might be good enough for very large $N$.
(iii) Nelder-Mead method. Since here we only have the access to function values, we need to apply some gradient-free methods.
(iv) Gauss-Newton Method. Since here the function $f$ is represented by the sum of non-linear least squares of $C^1$ functions, then we can compute the Jacobian matrix and the Hessian matrix of $f$. Therefore, we should apply the Gauss-Newton method.
(v) Projected GD Since we have all the constraints are linear, then when we apply the projection on linear constraints, we have the formula that $$ P_K(y)=x^*=y+A^T\left(A A^T\right)^{-1}(b-A y) $$ Therefore, when dealing with linear constraints, the GD algorithm with projection can be implemented successfully.
(vi) Lagrange multipliers. Since here we try to minimize a convex $C^1$ function with some non-linear convex equality constraints on the variables, then the Lagrange multipliers can be applied to solve this problem since projected GD perform badly on non-linear constraints.
The objective is to find the global minimum of the function $$ f(x, y)=e^{\sin (50 x)}+\sin \left(60 e^y\right)+\sin (70 \sin x)+\sin (\sin (80 y))-\sin (10(x+y))+\frac{x^2+y^2}{4}, $$ where $e^X$ denotes the usual exponential function $\exp (X)$.
from mpl_toolkits.mplot3d import Axes3D
def f(x, y):
return np.exp(np.sin(50 * x)) + np.sin(60 * np.exp(y)) + np.sin(70 * np.sin(x)) + np.sin(np.sin(80 * y)) - np.sin(10 * (x + y)) + (x**2 + y**2) / 4
Question 1
Prove that the function $f$ admits global minimizers on $\mathbb{R}^2$.
We apply the Propersition 2.1. Since $f$ is continuous on $\mathbb{R}^2$, and we have $\frac{x^2+y^2}{4} \rightarrow \infty$ as $(x,y)\rightarrow (\infty, \infty)$. Also, we got $$ \left|g(x,y)\right|=\left|e^{\sin (50 x)}+\sin \left(60 e^y\right)+\sin (70 \sin x)+\sin (\sin (80 y))-\sin (10(x+y)) \right| \leq e + 4 $$ Therefore, we got $$ \left | f(x,y) \right | \rightarrow \infty \text{ as } (x,y) \rightarrow (\infty, \infty) $$ then we prove that $f$ admits global minimizers on $\mathbb{R}^2$.
Question 2
Prove that the global minimizer lies in the region $[-4, 4]^{2}$
print('The value of 1/e is ', 1 / np.exp(1))
print('The value of f(-0.5, 0) is', f(-0.5, 0))
The value of 1/e is 0.36787944117144233 The value of f(-0.5, 0) is -0.8999682589251161
We can prove this by contradiction. Suppose the global minimizer $(x^{*},y^{*})$ lies outside the region $[-4, 4]^{2}$, we only need to find a point inside of the region $[-4, 4]^{2}$ that has a smaller value than $f(x^{*},y^{*})$. $$ g(x,y)=e^{\sin (50 x)}+\sin \left(60 e^y\right)+\sin (70 \sin x)+\sin (\sin (80 y))-\sin (10(x+y)) \geq -4 + \frac{1}{e} $$ Since, when the minimizer $(x^{*},y^{*})$ lies outside the region $[-4, 4]^{2}$, we have $$ \frac{x^{2}+y^{2}}{4} \geq 4 $$ we got $$ f(x,y) \geq -4 + \frac{1}{e} + 4 = \frac{1}{e} \quad \text{when } (x,y) \notin [-4, 4]^{2} $$ Since we find a point $f(-0.5,0)=-0.899968 < \frac{1}{e}$, which is a contradiction, then we finish the proof. *Q.E.D.*
Question 3
Make a plot of the function values in the region indicated above and conjecture a more precise location of the global minimum.
x = np.linspace(-4, 4, 200)
y = np.linspace(-4, 4, 200)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
# plot the 2D and 3D figures
plt.figure(figsize=(8, 6))
plt.contour(X, Y, Z, levels=20)
plt.colorbar()
plt.xlim(-4, 4)
plt.ylim(-4, 4)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Contour plot of f(x, y)')
plt.bar(0, 0, label='f(x, y)', color='gray', alpha=0.5)
plt.show()
fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_zlabel('Z-axis')
ax.set_title('3D plot of f(x, y)')
plt.show()
Question 4
What terms in the expression of the objective function generate the oscillatory behavior? Justify your answer.
I listed all the terms in the expression of the objective function that generate the oscillatory behavior as follows:
$e^{\sin(50x)}$: The sine function inside the exponent oscillates between -1 and 1 as x varies. This causes the exponential term to oscillate between $e^{-1}$ and $e^1$.
$\sin(60e^y)$: osillates between -1 and 1 for its input.
$\sin(70\sin x)$: osillates between -1 and 1
$\sin(\sin(80y))$: osillates between -$\sin(1)$ and $\sin(1)$
$-\sin(10(x+y))$: osillates between -1 and 1
# plot the graph sin(60e^y)
def f_plot(x):
return np.sin(60 * np.exp(x))
x = np.linspace(-20, 20, 200)
y = f_plot(x)
plt.figure(figsize=(8, 6))
plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.title(f'Plot of $\sin(60e^x)$')
plt.show()
Question 5
Having restrained the search region to $[−4,4]^2$,define a $N×N$ uniform grid on this region and evaluate the function f at each of these nodes. In each case find and print the minimal value attained by f on the current grid. See how the results change for $N = 2^{k} + 1, 4 \leq k \leq 10$. (use vectorization, not loops, to accelerate the computations)
list_min_posi = []
def evaluate_function_on_grid(N):
x = np.linspace(-4, 4, N)
y = np.linspace(-4, 4, N)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
return Z
def find_min_values():
for k in range(4, 11):
N = 2 ** k + 1
Z = evaluate_function_on_grid(N)
min_value = np.min(Z)
# return the indices of the minimum value
x_min_index, y_min_index = np.where(Z == min_value)
x_min = x_min_index[0] / (N - 1) * 8 - 4
y_min = y_min_index[0] / (N - 1) * 8 - 4
list_min_posi.append((y_min, x_min))
print(f"For k={k}, N = {N}, the minimum value of f(x, y) is {min_value} at the point ({y_min}, {x_min})")
find_min_values()
For k=4, N = 17, the minimum value of f(x, y) is -1.7612874733047068 at the point (-0.5, -0.5) For k=5, N = 33, the minimum value of f(x, y) is -1.8463465039832796 at the point (1.25, -0.5) For k=6, N = 65, the minimum value of f(x, y) is -1.8463465039832796 at the point (1.25, -0.5) For k=7, N = 129, the minimum value of f(x, y) is -2.449624019189007 at the point (0.0625, -0.5) For k=8, N = 257, the minimum value of f(x, y) is -2.943575583284294 at the point (0.34375, -0.09375) For k=9, N = 513, the minimum value of f(x, y) is -3.163951672593782 at the point (-0.390625, -0.09375) For k=10, N = 1025, the minimum value of f(x, y) is -3.303518906855208 at the point (-0.0234375, 0.2109375)
Question 6
For each $4 \leq k \leq 10$, pick the point $x_0$ of the previously found grids which gives the lowest value for the function $f$. Then choose a gradient-based algorithm and run it starting from the point $x_0$, observing what is the corresponding minimal value.
Here we use the method of Gradient Descent with variable steps method to find the minimum value of $f(x,y)$.
def gradientVariableStep(f,df,x_init,step=0.001,tol=1e-06,maxiter=20000):
maxstep = 10
x=x_init.copy()
xtab=[]
ftab=[]
xtab.append(x)
pval = f(x)
ftab.append(pval)
it=0
g=df(x)
while((it==0) or (step>tol and it<maxiter and np.linalg.norm(xtab[-1]-xtab[-2]))>tol):
actx= x-step*g
val = f(actx)
if(val<pval):
#accept iteration
x = actx
pval = val
step = min(1.1*step,maxstep)
xtab.append(x)
ftab.append(val)
g = df(x)
it=it+1
else:
step = 0.8*step
# boolean to indicate the convergence
if(it==maxiter):
conv = False
else:
conv = True
return xtab, ftab, conv
def f_np(x):
return np.exp(np.sin(50 * x[0])) + np.sin(60 * np.exp(x[1])) + np.sin(70 * np.sin(x[0])) + np.sin(np.sin(80 * x[1])) - np.sin(10 * (x[0] + x[1])) + (x[0]**2 + x[1]**2) / 4
def Gradf_np(x):
return np.array([50*np.exp(np.sin(50*x[0]))*np.cos(50*x[0])+70*np.cos(np.sin(70*x[0]))*np.cos(70*x[0])-10*np.cos(10*(x[0]+x[1]))+x[0]/2, 60*np.exp(x[1])*np.cos(60*np.exp(x[1]))+80*np.cos(np.sin(80*x[1]))*np.cos(80*x[1])-10*np.cos(10*(x[0]+x[1]))+x[1]/2])
minimal_values = []
minimizer = []
for i in range(len(list_min_posi)):
x_init = np.array(list_min_posi[i])
if i < len(list_min_posi) - 2:
xtab, ftab, conv = gradientVariableStep(f_np, Gradf_np, x_init, step=0.001, tol=1e-06, maxiter=200)
print(f'For initial point {x_init}, the algorithm converges in {len(xtab)} iterations')
print('Is the algorithm converges? ', conv)
print(f'The minimum value of f(x, y) is {ftab[-1]} at the point ({xtab[-1][0]}, {xtab[-1][1]})')
print('-' * 50)
minimal_values.append(ftab[-1])
minimizer.append(xtab[-1])
else:
res = minimize(f_np, x_init, method='L-BFGS-B', tol=1e-06)
print(f'For initial point {x_init}, the algorithm converges in {res.nit} iterations')
print('Is the algorithm converges? ', res.success)
print(f'The minimum value of f(x, y) is {res.fun} at the point ({res.x[0]}, {res.x[1]})')
print('-' * 50)
minimal_values.append(res.fun)
minimizer.append(res.x)
For initial point [-0.5 -0.5], the algorithm converges in 4 iterations Is the algorithm converges? True The minimum value of f(x, y) is -1.8340181272776597 at the point (-0.5109105553704272, -0.49506955611011516) -------------------------------------------------- For initial point [ 1.25 -0.5 ], the algorithm converges in 2 iterations Is the algorithm converges? True The minimum value of f(x, y) is -1.878771216558539 at the point (1.2352881097083996, -0.4943874516804751) -------------------------------------------------- For initial point [ 1.25 -0.5 ], the algorithm converges in 2 iterations Is the algorithm converges? True The minimum value of f(x, y) is -1.878771216558539 at the point (1.2352881097083996, -0.4943874516804751) -------------------------------------------------- For initial point [ 0.0625 -0.5 ], the algorithm converges in 8 iterations Is the algorithm converges? True The minimum value of f(x, y) is -2.839623976975017 at the point (0.07331443185636564, -0.49676114392025295) -------------------------------------------------- For initial point [ 0.34375 -0.09375], the algorithm converges in 2 iterations Is the algorithm converges? True The minimum value of f(x, y) is -2.946970693908365 at the point (0.3407463345919834, -0.09436988349901183) -------------------------------------------------- For initial point [-0.390625 -0.09375 ], the algorithm converges in 3 iterations Is the algorithm converges? True The minimum value of f(x, y) is -3.2081391294819426 at the point (-0.39452567638237185, -0.09320021707807923) -------------------------------------------------- For initial point [-0.0234375 0.2109375], the algorithm converges in 3 iterations Is the algorithm converges? True The minimum value of f(x, y) is -3.3068686474609335 at the point (-0.024403099159344636, 0.21061247893447854) --------------------------------------------------
Question 7
Observe how the minimal value found numerically and the point where the minimum is attained change with respect to $k \in \left\{4, ..., 10\right\}$. What can you conclude?
# plot the graph of minimal values
plt.figure(figsize=(8, 6))
plt.plot(np.arange(4, 11), minimal_values, 'o-', color='red')
plt.xlabel('k')
plt.ylabel('minimal values')
plt.title('The graph of minimal values')
plt.show()
# plot the trajectory of minimizer and the points corresponding to the minimal values and order
plt.figure(figsize=(8, 6))
for i in range(len(list_min_posi)):
if i == 0:
plt.plot(minimizer[i][0], minimizer[i][1], 'o-', color='red', label='Trajectory')
else:
plt.plot(minimizer[i][0], minimizer[i][1], 'o-', color='red')
plt.text(minimizer[i][0]-0.05, minimizer[i][1]+0.01, f'{i+1}', fontsize=15)
if i > 0:
plt.annotate('', xy=minimizer[i], xytext=minimizer[i-1], arrowprops=dict(facecolor='black', shrink=0.02, width=1, headwidth=5))
plt.xlabel('x')
plt.ylabel('y')
plt.title('The trajectory of minimizer and the points corresponding to the minimal values and order')
plt.legend()
plt.show()
We can see that as $k$ increases, the minimal value found numerically decreases and the point where the minimum is attained changes. From the trajectory of the minimizers, we can guess the minimizer is of the function is around $(0, 0.2)$
Question 8
In view of the previous observation what could be the global minimum of the function $f$ on $\mathbb{R}^2$ ? What arguments can you give to support your claim?
From the previous observation, we can conclude that the global minimum of the function $f$ on $\mathbb{R}^2$ is about $f_{\min} \simeq -3.304$.
Consider the function $J:[0, \infty)^2 \rightarrow \mathbb{R}$ defined by $$ J(x, y)=\cosh (4 x+y) $$ Recall that the hyperbolic cosine function is defined by $\cosh (x)=(\exp (x)+\exp (-x)) / 2$. Consider the following sets $$ \begin{gathered} K_1=\left\{(x, y) \in[0, \infty)^2: x y \geq 1\right\} \\ K_2=\left\{(x, y) \in[0, \infty)^2: y \leq 1+x, y \geq 2-x\right\} . \end{gathered} $$
Question 10
Prove that the sets $K_1$ and $K_2$ are convex. Are the sets $K_1$ and $K_2$ closed? Is the function $J$ convex? Finally, prove that $J$ admits minimizers on $K_1$. What can you say about existence of minimizers on $K_2$ ?
# plot the graph of cosh(x)
x = np.linspace(-2, 2, 100)
y = np.cosh(x)
plt.figure(figsize=(8, 6))
plt.plot(x, y, color='red')
plt.xlabel('x')
plt.ylabel('y')
plt.title('The graph of cosh(x)')
plt.show()
1. Convexity
We recall the definition of convex set as follows: $\mathrm{~A}$ set $C$ is convex if the line segment between any two points in $C$ lies in $C$, i.e. $\forall x_1, x_2 \in$ $C, \forall \theta \in[0,1]$ $$ \theta x_1+(1-\theta) x_2 \in C $$ We first prove that $K_1$ is convex: Suppose $(x_1,y_1),(x_2,y_2)\in K_1$. Then we have $x_1y_1\geq 1$ and $x_2y_2\geq 1$. For any $\lambda \in [0,1]$, we have $$ \begin{aligned} x_\lambda &= \lambda x_1 + (1-\lambda)x_2 \\ y_\lambda &= \lambda y_1 + (1-\lambda)y_2 \\ \end{aligned} $$ Then we have $$ \begin{aligned} x_\lambda y_\lambda &= \lambda^{2} x_{1}y_{1} + (1-\lambda)^{2} x_{2}y_{2} + \lambda(1-\lambda)(x_1y_2 + x_2y_1) \\ &\geq 1 + \lambda(1-\lambda)(x_1y_2 + x_2y_1) \\ & \geq 1 \quad \text{since} \quad x_1, x_2, y_1, y_2 \geq 0 \quad \text{and} \quad \lambda(1-\lambda) \geq 0 \end{aligned} $$ Therefore, $(x_\lambda, y_\lambda) \in K_1$ for any $\lambda \in [0,1]$, which implies that $K_1$ is convex.
We then prove that $K_2$ is convex: Suppose $(x_1,y_1),(x_2,y_2)\in K_2$. Then we have $2-x_{1} \leq y_{1} \leq 1+x_{1}$ and $2-x_{2} \leq y_{2} \leq 1+x_{2}$. For any $\lambda \in [0,1]$, we have $$ \begin{aligned} x_\lambda &= \lambda x_1 + (1-\lambda)x_2 \\ y_\lambda &= \lambda y_1 + (1-\lambda)y_2 \\ \end{aligned} $$ Then we have $$ \begin{aligned} y_\lambda &\leq \lambda(1+x_1) + (1-\lambda)(1+x_2) \\ &= 1 + x_\lambda \end{aligned} $$ and $$ \begin{aligned} y_\lambda &\geq \lambda(2-x_1) + (1-\lambda)(2-x_2) \\ &= 2 - x_\lambda \end{aligned} $$ Therefore, $(x_\lambda, y_\lambda) \in K_2$ for any $\lambda \in [0,1]$, which implies that $K_2$ is convex.
2. Closed
Here we apply the definition of closed sets from MAA202 - Topology and Multivariable calculus.
We first prove that $K_1$ is closed. Indeed, let $\left(z_n\right)_{n \in \mathbb{N}} \in K_{1}^{\mathbb{N}}$ a sequence converging to $z_{\infty}=\left(x_{\infty}, y_{\infty}\right)$ in $\mathbb{R}^2$. We want to show that $z_{\infty} \in K_{1}$. Since $z_n \in K_{1}$ for all $n \in \mathbb{N}$, we can write $z_n=\left(x_n, y_n\right)$ with $$ \forall n \in \mathbb{N}, x_n \cdot y_n \geq 1 $$ By the previous convergence, we have $$ x_n \underset{n \rightarrow+\infty}{\longrightarrow} x_{\infty} \text { and } y_n \underset{n \rightarrow+\infty}{\longrightarrow} y_{\infty} $$ therefore we can pass to the limit in the previous inequality to get $$ x_{\infty} \cdot y_{\infty} \geq 1 $$ which means that $z_{\infty} \in K_{1}$.
We then prove that $K_1$ is closed. With the same method we can prove that sets $\{(x,y) \in \mathbb{R}^2: y \leq 1+x\}$ and $\{(x,y) \in \mathbb{R}^2: y \geq 2-x\}$ are closed. Therefore, the intersection of these two sets is also closed, which implies that $K_2$ is closed. (This is the propersition from MAA202 Topology)
3. Convexity of $J$
$J$ is convex: To show that $J$ is convex, we need to show that for any $(x_1,y_1),(x_2,y_2) \in [0,\infty)^2$ and any $\lambda \in [0,1]$, we have $$ J(\lambda(x_1,y_1) + (1-\lambda)(x_2,y_2)) \leq \lambda J(x_1,y_1) + (1-\lambda)J(x_2,y_2). $$ We have $$ \begin{aligned} cosh(4(\lambda x_1 + (1-\lambda)x_2) + \lambda y_1 + (1-\lambda)y_2) =& \frac{1}{2} \left(\exp\left(4(\lambda x_1 + (1-\lambda)x_2) + \lambda y_1 + (1-\lambda)y_2\right) + \exp\left(-4(\lambda x_1 + (1-\lambda)x_2) - \lambda y_1 - (1-\lambda)y_2\right)\right) \\ \leq& \frac{1}{2} \left(\lambda \exp\left(4x_1 + y_1\right) + (1-\lambda) \exp\left(4x_2 + y_2\right) + \lambda \exp\left(-4x_1 - y_1\right) + (1-\lambda) \exp\left(-4x_2 - y_2\right)\right) \\ =& \lambda \cosh(4x_1+y_1) + (1-\lambda) \cosh(4x_2+y_2) \\ =& \lambda J(x_1,y_1) + (1-\lambda)J(x_2,y_2) \end{aligned} $$ where we used the fact that $\cosh(x)$ is a convex function.
4. Existence of a solution
$J$ admits minimizers on $K_1$: Since $K_1$ is a closed and bounded from below, also $J$ is an increasing function therefore $J$ attains its minimum on $K_1$. Then we can apply the Lagrange multiplier method to find the minimizer of $J$ on $K_1$. We have $f(x,y) = xy - 1$ and $g(x,y) = J(x,y)$. Then the stationary points of $g$ subject to the constraint $f(x,y) = 0$ are given by the solutions of the system of equations $$ \begin{aligned} \nabla g(x,y) &= \lambda \nabla f(x,y) \\ f(x,y) &= 0 \end{aligned} $$ where $\lambda$ is the Lagrange multiplier. We have $$ \begin{aligned} \nabla g(x,y) &= (4\sinh(4x+y), \sinh(4x+y)) \\ \nabla f(x,y) &= (y,x) \end{aligned} $$ Therefore, the stationary points are given by the solutions of the system of equations $$ \begin{aligned} 4\sinh(4x+y) &= \lambda y \\ \sinh(4x+y) &= \lambda x \\ xy &= 1 \end{aligned} $$ Then we got $y=4x$ with the result that $4x^2=1 \Rightarrow x = \frac{1}{2}$. Therefore, the stationary points are $(x,y) = (\frac{1}{2}, 2)$.
With the plot in below, we can know that on $K_2$ there is also a minimal value.
Question 11
Use Python to plot the behavior of the function $J$ on the sets $K_1$ and $K_2$ (separately). For unbounded sets just choose a bounding region of the type $[0, M]^2$. Conjecture the location of the minimizers and give a more precise answer regarding the existence of minimizers of $J$ on $K_2$.
def J(x):
return np.cosh(4 * x[0] + x[1])
def constraint(x):
return 1 - x[0] * x[1]
def constraint2(x):
return x[0] + x[1] - 2
def constraint3(x):
return x[0] - x[1] + 1
plt.figure(figsize=(8, 6))
xmin=0
xmax=3
ymin=0
ymax=3
aX0=np.linspace(xmin,xmax,100)
aX1=np.linspace(ymin,ymax,100)
X, Y = np.meshgrid(aX0, aX1)
K2 = (Y <= X + 1) & (Y >= 2 - X)
Z=np.array([[J(np.array([x0,x1])) for x0 in aX0] for x1 in aX1])
ZG=np.array([[constraint(np.array([x0,x1])) for x0 in aX0] for x1 in aX1])
plt.contour(aX0,aX1,ZG,levels=0,colors='r',linewidths=2)
plt.contour(aX0,aX1,Z,12)
plt.axis('scaled')
plt.colorbar()
plt.xlabel('$x_0$')
plt.ylabel('$x_1$')
plt.title(r'Plot of $\cosh(4x+y)$ with constrain $K_{1}$')
plt.contour(X, Y, X * Y, levels=[1], colors="red", linewidths=2)
plt.contourf(X, Y, X * Y, levels=[1, 1e2], colors="red", alpha=0.1)
plt.text(0.5, 2.1, "$K_1$", fontsize=20, color="red")
plt.scatter(0.5, 2, c="black", marker="o", s=60, label="Conjectured Optimal Point")
plt.legend()
plt.show()
X, Y = np.meshgrid(aX0, aX1)
plt.figure(figsize=(8, 6))
ZG2=np.array([[constraint2(np.array([x0,x1])) for x0 in aX0] for x1 in aX1])
ZG3=np.array([[constraint3(np.array([x0,x1])) for x0 in aX0] for x1 in aX1])
plt.contourf(X, Y, K2, levels=[1e-10, 1], colors="green", alpha=0.1)
plt.text(0.7, 1.4, "$K_2$", fontsize=20, color="green")
plt.contour(aX0,aX1,ZG2,levels=0,colors='r',linewidths=2)
plt.contour(aX0,aX1,ZG3,levels=0,colors='r',linewidths=2)
plt.contour(aX0,aX1,Z,12)
plt.axis('scaled')
plt.colorbar()
plt.xlabel('$x_0$')
plt.ylabel('$x_1$')
plt.title(r'Plot of $\cosh(4x+y)$ with constrain $K_{2}$')
plt.show()
From the plot above we can see that the Contour lines are in the slope of -4. Since the function $\cosh(x)$ is increasing when $x \in [0, \infty)$, we can denote that $\cosh(4x+y)=\cosh(m)$ where $m=4x+y$. Therefore, we can see that the contour lines are in the slope of -4.
Then in order to find the minimizer, we can transfer this quesiton into find the contact point between the contour line and the set $K_1$. We assume the coutour line as $y=-4x+c$ where $c$ is a constant. Then we can get the contact point by solving the following equation: $$ y' = (1/x)'=-\frac{1}{x^{2}}=-4 \Rightarrow x=\frac{1}{2} y = 2 $$ For the same method, we can get the contact point between the contour line and the set $K_2$ at the point $(0.5, 1.5)$.
Question 12
Compute the projection operator on the set $K_1$ : given $X$ in $\mathbb{R}^2$ find the closest point to $X$ which is in $K_1$. Approximate the solution of the problem $$ \min _{x \in K_1} J(x) $$ using the Projected Gradient algorithm.
We denote the point $X=(x_{0},y_{0})$. Then we have two cases:
If $X \in K_{1}$, then the closest point to $X$ is $X$ itself.
If $X \notin K_{1}$, then the closest point is the point of the normal line from $X$ to $y=\frac{1}{m}$. We assume $Y=(m,\frac{1}{m})$, then we got the tangent line at $Y$ as $y-\frac{1}{m}=-\frac{1}{m^{2}}(x-m)$. And the slope of the line between $X$ and $Y$ is $\frac{\frac{1}{m}-y_{0}}{m-x_{0}}$. Then we got
which we solve the equation we can get the distance, but here we just use the scipy
and sympy
package for computation.
def closest_distance(x1, y1):
x = sp.symbols('x')
equation = Eq(2 * x **4 - 2 * x1 * x **3 + 2 * y1 * x - 2, 0)
roots = sp.solve(equation)
x0 = 0
# compute the distance
for root in roots:
if root.evalf() > 0:
return np.array([float(root.evalf()), float(1 / root.evalf())])
def distance(x1, y1):
return np.sqrt((x1 - result[0]) ** 2 + (y1 - result[1]) ** 2)
x1, y1 = 0.5, 1.0
result = closest_distance(x1, y1)
print(f'The closest distance between the point ({x1}, {y1}) is ({result[0]}, {result[1]}) and the curve is {distance(x1, y1)}')
The closest distance between the point (0.5, 1.0) is (0.8216181575795043, 1.2171103946217674) and the curve is 0.3880401560891017
# plot the graph
plt.figure(figsize=(8, 6))
Z=np.array([[J(np.array([x0,x1])) for x0 in aX0] for x1 in aX1])
ZG=np.array([[constraint(np.array([x0,x1])) for x0 in aX0] for x1 in aX1])
plt.contour(aX0,aX1,ZG,levels=0,colors='r',linewidths=2)
plt.contour(aX0,aX1,Z,12)
plt.axis('scaled')
plt.colorbar()
plt.xlabel('$x_0$')
plt.ylabel('$x_1$')
plt.title(r'Plot of $\cosh(4x+y)$ with constrain $K_{1}$')
plt.contour(X, Y, X * Y, levels=[1], colors="red", linewidths=2)
plt.contourf(X, Y, X * Y, levels=[1, 1e2], colors="red", alpha=0.1)
plt.text(0.5, 2.1, "$K_1$", fontsize=20, color="red")
plt.scatter(0.5, 2, c="black", marker="o", s=60, label="Conjectured Optimal Point")
# plot the point x1, y1 and result[0], result[1] and the connecting line
plt.scatter(x1, y1, c="black", marker="o", s=30, label="Initial Point")
plt.scatter(result[0], result[1], c="black", marker="o", s=30, label="Closest Point")
plt.plot([x1, result[0]], [y1, result[1]], c="black", linestyle="-", linewidth=1, label="Connecting Line")
plt.legend()
plt.show()
We have the Projected Gradient algorithm as follows:
GD with projection
Consider $K$ a closed and convex set in $\mathbb{R}^n$ and let $x_0 \in K$ be an initial point. The solution of the problem $$ \min _{x \in K} f(x) $$ may be approximated using the iterative algorithm $$ x_{i+1}=P_K\left(x_i-t \nabla f\left(x_i\right)\right) $$ for some $t>0$ small enough.
Then we have the Gradient of $J$ as follows: $$ \nabla J(x, y)=\left[\begin{array}{c} \frac{\partial J}{\partial x} \\ \frac{\partial J}{\partial y} \end{array}\right]=\left[\begin{array}{c} 4 \sinh (4 x+y) \\ \sinh (4 x+y) \end{array}\right] $$
def J(x):
return np.cosh(4 * x[0] + x[1])
def GradJ(x):
return np.array([4 * np.sinh(4 * x[0] + x[1]), np.sinh(4 * x[0] + x[1])])
def projG(x):
if x[0] * x[1] < 1:
return closest_distance(x[0], x[1])
else:
return np.array([x[0], x[1]])
def projection_gradient_algorithm(f, df, x_init, step=1e-03, tol=1e-06, max_iter=2000):
x = x_init.copy()
xtab = []
ftab = []
xtab.append(projG(x))
pval = f(x)
ftab.append(pval)
it = 0
g = df(x)
while (it == 0) or (
step > tol and it < max_iter and np.linalg.norm(xtab[-1] - xtab[-2]) > tol
):
x = projG(x - step * df(x))
xtab.append(x)
fval = f(x)
ftab.append(fval)
it = it + 1
if it == max_iter:
conv = False
else:
conv = True
return xtab, ftab, conv
x0 = np.array([1.0, 0.5])
xtab, ftab, conv = projection_gradient_algorithm(J, GradJ, x0, max_iter=1000)
print('Has the algorithm converged ? : ',conv)
plt.figure()
plt.plot(ftab)
plt.xlabel('iteration: $it$')
plt.ylabel('$J(it)$')
plt.show()
Has the algorithm converged ? : True
plt.figure(figsize=(8, 6))
aX0=np.linspace(xmin,xmax,100)
aX1=np.linspace(ymin,ymax,100)
Z=np.array([[J(np.array([x0,x1])) for x0 in aX0] for x1 in aX1])
ZG=np.array([[constraint(np.array([x0,x1])) for x0 in aX0] for x1 in aX1])
plt.contour(aX0,aX1,ZG,levels=0,colors='b',linewidths=2)
plt.contour(aX0,aX1,Z,25)
lx0=[X[0] for X in xtab]
lx1=[X[1] for X in xtab]
plt.plot(x0[0],x0[1],'-xr')
plt.plot(lx0,lx1,"-ro")
plt.title('Number of iterations. = '+str(np.shape(lx0)[0]))
plt.axis('scaled')
plt.colorbar()
plt.show()
print("The final position is : ", xtab[-1])
print("The value of the function at this point is : ", ftab[-1])
The final position is : [0.50000921 1.99996316] The value of the function at this point is : 27.30823285453461
Question 13
Show that the minimizer $\left(x_0, y_0\right)$ also minimizes the function $J_2(x, y)=4 x+y$ on $K_1$ and on $K_2$. Deduce that $$ \min _{x \in K_2} J(x) $$ is equivalent to a linear programming problem and solve it using a linear programming software (like scipy. optimize.linprog, see the examples on Moodle).
From the plot above we can see that the Contour lines are in the slope of -4. Since the function $\cosh(x)$ is increasing when $x \in [0, \infty)$, we can denote that $\cosh(4x+y)=\cosh(m)$ where $m=4x+y$. Therefore, we can see that the contour lines are in the slope of -4. This we have proved in Question 11. For constrain $K_{1}$, since the constrain condition is not linear, therefore it's not possible for us to use linear programming to solve this problem. For constrain $K_{2}$, we can see that the constrain condition is linear, therefore we can use linear programming to solve this problem.
def f_1(x):
return 4 * x[0] + x[1]
def constraint1(x):
return x[0] * x[1] - 1
result = minimize(f_1, np.array([4, 3]), constraints={'fun': constraint1, 'type': 'ineq'})
print("Optimal Point:", result.x, "\tValue:", result.fun)
print('The value of function is ', np.cosh(result.fun))
print('-' * 50)
print(res)
Optimal Point: [0.49977422 2.00090346] Value: 4.0000003459285 The value of function is 27.308242276378245 -------------------------------------------------- message: CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH success: True status: 0 fun: -3.3068686474609335 x: [-2.440e-02 2.106e-01] nit: 3 jac: [-8.153e-05 5.597e-04] nfev: 21 njev: 7 hess_inv: <2x2 LbfgsInvHessProduct with dtype=float64>
Then we use linear programming to solve the constrain $K_{2}$, we have the following computation and code: $$ \begin{cases} y \leq x + 1\\ y \geq 2 - x \end{cases} \Rightarrow \begin{cases} -x + y \leq 1\\ -x -y \leq -2 \end{cases} $$ Then we got the matrix form as follows: $$ \begin{bmatrix} -1 & 1\\ -1 & -1 \end{bmatrix} \begin{bmatrix} x\\ y \end{bmatrix} \leq \begin{bmatrix} 1\\ -2 \end{bmatrix} $$
c = [4, 1] # here is 4x + y
A = [[-1, 1], [-1, -1]]
b = [1, -2]
res = linprog(c, A_ub=A, b_ub=b, bounds=(0, None))
print("Optimal Point:", res.x, "\tValue:", res.fun)
print('The value of function is ', np.cosh(res.fun))
print('-' * 50)
print(res)
Optimal Point: [0.5 1.5] Value: 3.5 The value of function is 16.572824671057315 -------------------------------------------------- message: Optimization terminated successfully. (HiGHS Status 7: Optimal) success: True status: 0 fun: 3.5 x: [ 5.000e-01 1.500e+00] nit: 2 lower: residual: [ 5.000e-01 1.500e+00] marginals: [ 0.000e+00 0.000e+00] upper: residual: [ inf inf] marginals: [ 0.000e+00 0.000e+00] eqlin: residual: [] marginals: [] ineqlin: residual: [ 0.000e+00 0.000e+00] marginals: [-1.500e+00 -2.500e+00] mip_node_count: 0 mip_dual_bound: 0.0 mip_gap: 0.0
Question 14
Are there points for which $\nabla J(x, y)=0$. If no such points exist then what can we conclude regarding the minimum of $J$ on $K_1$ ? Give an exact formula for the minimum of $J$ on $K_1$ based on the previous facts.
# plot the graph of cosh(x)
x = np.linspace(-2, 2, 100)
y = np.sinh(x)
plt.figure(figsize=(8, 6))
plt.plot(x, y, color='red')
plt.xlabel('x')
plt.ylabel('y')
plt.title('The graph of cosh(x)')
plt.show()
We have $$ \nabla J(x, y)=\left[\begin{array}{c} \frac{\partial J}{\partial x} \\ \frac{\partial J}{\partial y} \end{array}\right]=\left[\begin{array}{c} 4 \sinh (4 x+y) \\ \sinh (4 x+y) \end{array}\right] $$ Therefore, if $\nabla J(x,y)=0$, then we got $\sinh(4x+y)=0 \Rightarrow 4x+y=0 \Rightarrow y=-4x$. Since we have $(x,y)\in [0,+\infty)^{2}$. Therefore, the only exist point is $(0,0)$. From out previous computation at Question 11, we have the minimum of $J$ on $K_1$ is $J(\frac{1}{2},2)=\cosh(4)$.