Gradient Derivation for Linear Regression

To find the derivative of the given cost function with respect to the parameters θ₀ and θ₁, we'll follow these steps. The cost function is:

\[ \text{Cost} = \frac{1}{2m} \sum_{i=1}^{m} (\theta_0 + \theta_1 x^{(i)} - y^{(i)})^2 \]

This is the mean squared error cost function commonly used in linear regression, where:

  • m is the number of training examples
  • x⁽ⁱ⁾ and y⁽ⁱ⁾ are the input and output of the i-th training example
  • θ₀ and θ₁ are the parameters of the linear model

Step 1: Rewrite the cost function

Let's rewrite the cost function for clarity:

\[ \text{Cost} = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 \]

where h_θ(x⁽ⁱ⁾) = θ₀ + θ₁x⁽ⁱ⁾ is the hypothesis function.

Step 2: Compute the partial derivative with respect to θ₀

We want to find:

\[ \frac{\partial \text{Cost}}{\partial \theta_0} \]

Using the chain rule:

\[ \frac{\partial \text{Cost}}{\partial \theta_0} = \frac{1}{2m} \sum_{i=1}^{m} 2(h_\theta(x^{(i)}) - y^{(i)}) \cdot \frac{\partial}{\partial \theta_0} (h_\theta(x^{(i)}) - y^{(i)}) \]

Since ∂/∂θ₀(h_θ(x⁽ⁱ⁾) - y⁽ⁱ⁾) = 1, this simplifies to:

\[ \frac{\partial \text{Cost}}{\partial \theta_0} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \]

Step 3: Compute the partial derivative with respect to θ₁

We want to find:

\[ \frac{\partial \text{Cost}}{\partial \theta_1} \]

Again, using the chain rule:

\[ \frac{\partial \text{Cost}}{\partial \theta_1} = \frac{1}{2m} \sum_{i=1}^{m} 2(h_\theta(x^{(i)}) - y^{(i)}) \cdot \frac{\partial}{\partial \theta_1} (h_\theta(x^{(i)}) - y^{(i)}) \]

Since ∂/∂θ₁(h_θ(x⁽ⁱ⁾) - y⁽ⁱ⁾) = x⁽ⁱ⁾, this simplifies to:

\[ \frac{\partial \text{Cost}}{\partial \theta_1} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x^{(i)} \]

Final Derivatives

The partial derivatives of the cost function with respect to θ₀ and θ₁ are:

\[ \frac{\partial \text{Cost}}{\partial \theta_0} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \]\[ \frac{\partial \text{Cost}}{\partial \theta_1} = \frac{1}{m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)}) \cdot x^{(i)} \]

These derivatives are used in gradient descent to update the parameters θ₀ and θ₁ iteratively.