Neural Networks

Overview

Neural Networks are computational models inspired by the human brain's structure and function. They form the foundation of modern deep learning and have revolutionized machine learning across various domains.

Note: This content focuses on neural network fundamentals. For related topics:

  • Optimization: See optimization algorithms (api/content/deep_learning/fundamentals/optimization_algorithms.py)
  • Initialization: See initialization (api/content/deep_learning/fundamentals/initialization.py)
  • Backpropagation: See backpropagation (api/content/deep_learning/fundamentals/backpropagation.py)
  • Loss Functions: See loss functions (api/content/deep_learning/fundamentals/loss_functions.py)
  • Activation Functions: See activation functions (api/content/deep_learning/fundamentals/activation_functions.py)
  • Regularization: See regularization (api/content/deep_learning/fundamentals/regularization.py)

Key aspects:

  • Biological inspiration from neural systems
  • Ability to learn complex patterns from data
  • Universal function approximation capabilities
  • Scalability to large datasets and problems

Core Concepts

  • Network Architecture

    Neural networks consist of interconnected layers of neurons. The architecture involves:

    • Input Layer:
      • Receives raw data
      • Dimensionality matches input features
      • May include preprocessing steps
    • Hidden Layers:
      • Process and transform features
      • Apply non-linear transformations
      • Learn hierarchical representations
    • Output Layer:
      • Produces final predictions
      • Architecture depends on task type
      • Uses appropriate activation function

    Each connection has learnable parameters:

    • Weights: Learned through optimization
    • Biases: Offset values for flexibility
    • Initialization: Critical for training
    $$ z^{[l]} = W^{[l]}a^{[l-1]} + b^{[l]} $$ $$ a^{[l]} = g(z^{[l]}) $$ Where: - $z^{[l]}$ is the pre-activation at layer l - $W^{[l]}$ is the weight matrix - $a^{[l-1]}$ is the activation from previous layer - $b^{[l]}$ is the bias vector - $g()$ is the activation function
  • Forward Propagation

    Forward propagation transforms input data into predictions:

    • Linear Operations:
      • Matrix multiplication (weights)
      • Vector addition (biases)
      • Batch processing for efficiency
    • Non-linear Transformations:
      • Activation functions (ReLU, sigmoid, etc.)
      • Feature transformation
      • Representation learning
    • Layer Composition:
      • Sequential processing
      • Skip connections (in advanced architectures)
      • Multi-path architectures
    $$ h_\theta(x) = g^{[L]}(W^{[L]}g^{[L-1]}(W^{[L-1]}...g^{[1]}(W^{[1]}x + b^{[1]})... + b^{[L-1]}) + b^{[L]}) $$
  • Training Process

    Neural networks learn through iterative optimization:

    • Forward Pass:
      • Compute predictions
      • Store intermediate activations
      • Apply regularization (dropout, etc.)
    • Loss Computation:
      • Compare predictions with targets
      • Add regularization terms
      • Scale for batch size
    • Backward Pass:
      • Compute gradients
      • Apply gradient clipping
      • Handle gradient flow
    • Parameter Updates:
      • Apply optimization algorithm
      • Update learning rate
      • Apply momentum/adaptive methods
    $$ \text{Loss} = \frac{1}{m}\sum_{i=1}^m L(h_\theta(x^{(i)}), y^{(i)}) + \lambda R(\theta) $$ Where: - $L$ is the loss function - $R(\theta)$ is the regularization term - $\lambda$ is the regularization strength

Implementation

  • Modern Neural Network Implementation

    Implementation of a neural network with modern practices:

    • Modular architecture design
    • Training best practices
    • Regularization techniques
    • Performance monitoring
    
    import torch
    import torch.nn as nn
    import torch.optim as optim
    from torch.nn import functional as F
    
    class ModernNeuralNetwork(nn.Module):
        def __init__(self, input_size, hidden_sizes, output_size, dropout_rate=0.5):
            super(ModernNeuralNetwork, self).__init__()
            # Layer definitions
            self.layers = nn.ModuleList()
            
            # Input layer
            self.layers.append(nn.Linear(input_size, hidden_sizes[0]))
            self.batch_norms = nn.ModuleList([nn.BatchNorm1d(hidden_sizes[0])])
            
            # Hidden layers
            for i in range(len(hidden_sizes)-1):
                self.layers.append(nn.Linear(hidden_sizes[i], hidden_sizes[i+1]))
                self.batch_norms.append(nn.BatchNorm1d(hidden_sizes[i+1]))
            
            # Output layer
            self.output_layer = nn.Linear(hidden_sizes[-1], output_size)
            
            # Regularization
            self.dropout = nn.Dropout(dropout_rate)
            
            # Initialize weights
            self._initialize_weights()
        
        def _initialize_weights(self):
            for layer in self.layers:
                nn.init.kaiming_normal_(layer.weight)
                nn.init.zeros_(layer.bias)
            nn.init.kaiming_normal_(self.output_layer.weight)
            nn.init.zeros_(self.output_layer.bias)
        
        def forward(self, x):
            # Forward pass with modern practices
            for layer, batch_norm in zip(self.layers, self.batch_norms):
                x = layer(x)
                x = batch_norm(x)
                x = F.relu(x)
                x = self.dropout(x)
            return self.output_layer(x)
    
    class TrainingManager:
        def __init__(self, model, learning_rate=0.001, weight_decay=1e-5):
            self.model = model
            self.criterion = nn.CrossEntropyLoss()
            self.optimizer = optim.Adam(
                model.parameters(),
                lr=learning_rate,
                weight_decay=weight_decay
            )
            self.scheduler = optim.lr_scheduler.ReduceLROnPlateau(
                self.optimizer,
                mode='min',
                patience=5,
                factor=0.5
            )
        
        def train_epoch(self, dataloader):
            self.model.train()
            total_loss = 0
            
            for batch_idx, (data, target) in enumerate(dataloader):
                # Forward pass
                self.optimizer.zero_grad()
                output = self.model(data)
                loss = self.criterion(output, target)
                
                # Backward pass
                loss.backward()
                
                # Gradient clipping
                torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
                
                # Update weights
                self.optimizer.step()
                
                total_loss += loss.item()
            
            return total_loss / len(dataloader)
        
        def validate(self, dataloader):
            self.model.eval()
            total_loss = 0
            correct = 0
            
            with torch.no_grad():
                for data, target in dataloader:
                    output = self.model(data)
                    total_loss += self.criterion(output, target).item()
                    pred = output.argmax(dim=1)
                    correct += pred.eq(target).sum().item()
            
            avg_loss = total_loss / len(dataloader)
            accuracy = correct / len(dataloader.dataset)
            
            # Update learning rate
            self.scheduler.step(avg_loss)
            
            return avg_loss, accuracy
    
    # Example usage
    if __name__ == "__main__":
        # Network architecture
        input_size = 784  # e.g., MNIST
        hidden_sizes = [512, 256, 128]
        output_size = 10
        
        # Create model and training manager
        model = ModernNeuralNetwork(input_size, hidden_sizes, output_size)
        trainer = TrainingManager(model)
        
        # Training loop would go here
        # for epoch in range(num_epochs):
        #     train_loss = trainer.train_epoch(train_loader)
        #     val_loss, val_acc = trainer.validate(val_loader)
    

Interview Examples

Neural Network Architecture Design

How would you design a neural network architecture for a given problem? What factors would you consider?

Explain the relationship between network depth, width, and capacity

How do depth and width affect a neural network's learning capabilities?

Implement a Simple Feedforward Neural Network (NumPy)

Write a basic neural network from scratch using NumPy, including forward and backward propagation.

import numpy as np class SimpleNN: def __init__(self, input_size, hidden_size, output_size): self.W1 = np.random.randn(input_size, hidden_size) * 0.01 self.b1 = np.zeros((1, hidden_size)) self.W2 = np.random.randn(hidden_size, output_size) * 0.01 self.b2 = np.zeros((1, output_size)) def sigmoid(self, x): return 1 / (1 + np.exp(-x)) def sigmoid_derivative(self, x): s = self.sigmoid(x) return s * (1 - s) def forward(self, X): self.z1 = np.dot(X, self.W1) + self.b1 self.a1 = self.sigmoid(self.z1) self.z2 = np.dot(self.a1, self.W2) + self.b2 self.a2 = self.sigmoid(self.z2) return self.a2 def backward(self, X, y, learning_rate=0.1): m = X.shape[0] dz2 = self.a2 - y dW2 = np.dot(self.a1.T, dz2) / m db2 = np.sum(dz2, axis=0, keepdims=True) / m da1 = np.dot(dz2, self.W2.T) dz1 = da1 * self.sigmoid_derivative(self.z1) dW1 = np.dot(X.T, dz1) / m db1 = np.sum(dz1, axis=0, keepdims=True) / m self.W2 -= learning_rate * dW2 self.b2 -= learning_rate * db2 self.W1 -= learning_rate * dW1 self.b1 -= learning_rate * db1 # Example usage X = np.array([[0,0],[0,1],[1,0],[1,1]]) y = np.array([[0],[1],[1],[0]]) nn = SimpleNN(2, 4, 1) for epoch in range(1000): output = nn.forward(X) nn.backward(X, y) if epoch % 100 == 0: loss = np.mean((output - y) ** 2) print(f"Epoch {epoch}, Loss: {loss:.4f}")

Implement a Simple LSTM Cell (NumPy)

Write a single LSTM cell forward pass using NumPy.

import numpy as np def sigmoid(x): return 1 / (1 + np.exp(-x)) def tanh(x): return np.tanh(x) def lstm_cell_forward(x_t, h_prev, c_prev, params): Wf, bf, Wi, bi, Wo, bo, Wc, bc = params concat = np.concatenate((h_prev, x_t), axis=1) f_t = sigmoid(np.dot(concat, Wf) + bf) i_t = sigmoid(np.dot(concat, Wi) + bi) o_t = sigmoid(np.dot(concat, Wo) + bo) c_tilde = tanh(np.dot(concat, Wc) + bc) c_next = f_t * c_prev + i_t * c_tilde h_next = o_t * tanh(c_next) return h_next, c_next # Example usage np.random.seed(0) x_t = np.random.randn(1, 3) h_prev = np.random.randn(1, 5) c_prev = np.random.randn(1, 5) params = [np.random.randn(8, 5) for _ in range(4)] + [np.random.randn(1, 5) for _ in range(4)] h_next, c_next = lstm_cell_forward(x_t, h_prev, c_prev, params) print(h_next)

Practice Questions

1. Implement a simple feedforward neural network using NumPy Hard

Hint: Break it down into initialization, forward pass, and backward pass
import numpy as np def sigmoid(x): return 1 / (1 + np.exp(-x)) def sigmoid_derivative(x): return x * (1 - x) class NeuralNetwork: def __init__(self, x, y): self.input = x self.weights1 = np.random.rand(self.input.shape[1], 4) self.weights2 = np.random.rand(4, 1) self.y = y self.output = np.zeros(y.shape) def feedforward(self): self.layer1 = sigmoid(np.dot(self.input, self.weights1)) self.output = sigmoid(np.dot(self.layer1, self.weights2)) def backprop(self): d_weights2 = np.dot(self.layer1.T, 2 * (self.y - self.output) * sigmoid_derivative(self.output)) d_weights1 = np.dot(self.input.T, np.dot(2 * (self.y - self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)) self.weights1 += d_weights1 self.weights2 += d_weights2

2. Explain how backpropagation works in a neural network Medium

Hint: Think about the chain rule from calculus
\frac{\partial L}{\partial w_{ij}} = \frac{\partial L}{\partial y_j} \frac{\partial y_j}{\partial w_{ij}}

3. How does vanishing gradient problem affect deep networks? Hard

Hint: Consider what happens to gradients in very deep networks with certain activation functions