Rnns

Overview

Recurrent Neural Networks (RNNs) are specialized architectures designed for processing sequential data. They maintain an internal state (memory) that allows them to capture temporal dependencies in the input sequence.

RNN Architecture

Basic architecture of a Recurrent Neural Network showing the unfolded computational graph.

Key aspects:

  • Sequential processing
  • Hidden state memory
  • Backpropagation through time
  • Vanishing/exploding gradients

Core Concepts

  • Basic RNN Cell

    The fundamental building block of RNNs:

    • Input transformation
    • State update
    • Output generation
    • Activation functions
    $$ h_t = anh(W_{hh}h_{t-1} + W_{xh}x_t + b_h)\\ y_t = W_{hy}h_t + b_y\\ ext{BPTT Loss} = \sum_{t=1}^T L(y_t, \hat{y}_t) $$
  • LSTM Architecture

    Long Short-Term Memory units for better gradient flow:

    • Input gate
    • Forget gate
    • Output gate
    • Cell state
    $$ f_t = \sigma(W_f[h_{t-1}, x_t] + b_f)\\ i_t = \sigma(W_i[h_{t-1}, x_t] + b_i)\\ c_t = f_t \odot c_{t-1} + i_t \odot anh(W_c[h_{t-1}, x_t] + b_c) $$
  • GRU Architecture

    Gated Recurrent Units for efficient computation:

    • Reset gate
    • Update gate
    • Candidate state
    • Final state
    $$ z_t = \sigma(W_z[h_{t-1}, x_t] + b_z)\\ r_t = \sigma(W_r[h_{t-1}, x_t] + b_r)\\ h_t = z_t \odot h_{t-1} + (1-z_t) \odot anh(W_h[r_t \odot h_{t-1}, x_t] + b_h) $$

Implementation

  • PyTorch RNN, LSTM, and GRU

    Example implementations of RNN, LSTM, and GRU using PyTorch:

    
    import torch
    import torch.nn as nn
    
    class BasicRNNCell(nn.Module):
        def __init__(self, input_size, hidden_size):
            super(BasicRNNCell, self).__init__()
            self.ih = nn.Linear(input_size, hidden_size)
            self.hh = nn.Linear(hidden_size, hidden_size)
        def forward(self, x, h):
            h = torch.tanh(self.ih(x) + self.hh(h))
            return h
    
    class CustomLSTMCell(nn.Module):
        def __init__(self, input_size, hidden_size):
            super(CustomLSTMCell, self).__init__()
            self.W_ii = nn.Linear(input_size, hidden_size)
            self.W_hi = nn.Linear(hidden_size, hidden_size)
            self.W_if = nn.Linear(input_size, hidden_size)
            self.W_hf = nn.Linear(hidden_size, hidden_size)
            self.W_io = nn.Linear(input_size, hidden_size)
            self.W_ho = nn.Linear(hidden_size, hidden_size)
            self.W_ig = nn.Linear(input_size, hidden_size)
            self.W_hg = nn.Linear(hidden_size, hidden_size)
        def forward(self, x, h, c):
            i = torch.sigmoid(self.W_ii(x) + self.W_hi(h))
            f = torch.sigmoid(self.W_if(x) + self.W_hf(h))
            o = torch.sigmoid(self.W_io(x) + self.W_ho(h))
            g = torch.tanh(self.W_ig(x) + self.W_hg(h))
            c = f * c + i * g
            h = o * torch.tanh(c)
            return h, c
    
    class SequenceModel(nn.Module):
        def __init__(self, input_size, hidden_size, num_layers, cell_type='lstm', bidirectional=False, dropout=0.0):
            super(SequenceModel, self).__init__()
            self.cell_type = cell_type.lower()
            if self.cell_type == 'rnn':
                self.rnn = nn.RNN(input_size, hidden_size, num_layers, bidirectional=bidirectional, dropout=dropout)
            elif self.cell_type == 'lstm':
                self.rnn = nn.LSTM(input_size, hidden_size, num_layers, bidirectional=bidirectional, dropout=dropout)
            elif self.cell_type == 'gru':
                self.rnn = nn.GRU(input_size, hidden_size, num_layers, bidirectional=bidirectional, dropout=dropout)
            else:
                raise ValueError(f"Unknown cell type: {cell_type}")
        def forward(self, x, h=None):
            return self.rnn(x, h)
    
  • NumPy RNN from Scratch

    Manual implementation of a simple RNN cell in NumPy for educational purposes:

    
    import numpy as np
    
    def rnn_step(x_t, h_prev, Wx, Wh, b):
        return np.tanh(np.dot(x_t, Wx) + np.dot(h_prev, Wh) + b)
    
    # Example usage
    x_seq = np.random.randn(5, 3)  # 5 timesteps, 3 features
    h = np.zeros((1, 4))           # hidden size 4
    Wx = np.random.randn(3, 4)
    Wh = np.random.randn(4, 4)
    b = np.zeros((1, 4))
    for t in range(5):
        h = rnn_step(x_seq[t:t+1], h, Wx, Wh, b)
        print(h)
    

Interview Examples

Vanishing Gradients in RNNs

Explain the vanishing gradient problem in RNNs. How do LSTM and GRU architectures address this issue?

# Key points for interview: # - In standard RNNs, gradients are repeatedly multiplied by weights during backpropagation through time (BPTT). # - If weights < 1, gradients shrink exponentially (vanish); if > 1, they explode. # - LSTM/GRU introduce gating mechanisms and memory cells to preserve gradients and allow learning of long-term dependencies.

Implement a Simple RNN in NumPy

Write a function to perform a forward pass of a basic RNN cell for a sequence.

import numpy as np def rnn_forward(X, h0, Wx, Wh, b): h = h0 outputs = [] for t in range(X.shape[0]): h = np.tanh(np.dot(X[t], Wx) + np.dot(h, Wh) + b) outputs.append(h) return np.stack(outputs) # Example usage: # X = np.random.randn(10, 8) # 10 timesteps, 8 features # h0 = np.zeros((4,)) # hidden size 4 # Wx = np.random.randn(8, 4) # Wh = np.random.randn(4, 4) # b = np.zeros((4,)) # out = rnn_forward(X, h0, Wx, Wh, b)

Practice Questions

1. How would you implement this in a production environment? Hard

Hint: Consider scalability and efficiency

2. Explain the core concepts of Rnns Easy

Hint: Think about the fundamental principles

3. What are the practical applications of Rnns? Medium

Hint: Consider both academic and industry use cases