Rnns

Overview

Recurrent Neural Networks (RNNs) are specialized architectures designed for processing sequential data. They maintain an internal state (memory) that allows them to capture temporal dependencies in the input sequence.

Basic architecture of a Recurrent Neural Network showing the unfolded computational graph.

Key aspects:

Sequential processing
Hidden state memory
Backpropagation through time
Vanishing/exploding gradients

Core Concepts

Basic RNN Cell
▶
The fundamental building block of RNNs:
- Input transformation
- State update
- Output generation
- Activation functions
$$ h_t = anh(W_{hh}h_{t-1} + W_{xh}x_t + b_h)\\ y_t = W_{hy}h_t + b_y\\ ext{BPTT Loss} = \sum_{t=1}^T L(y_t, \hat{y}_t) $$
LSTM Architecture
▶
Long Short-Term Memory units for better gradient flow:
- Input gate
- Forget gate
- Output gate
- Cell state
$$ f_t = \sigma(W_f[h_{t-1}, x_t] + b_f)\\ i_t = \sigma(W_i[h_{t-1}, x_t] + b_i)\\ c_t = f_t \odot c_{t-1} + i_t \odot anh(W_c[h_{t-1}, x_t] + b_c) $$
GRU Architecture
▶
Gated Recurrent Units for efficient computation:
- Reset gate
- Update gate
- Candidate state
- Final state
$$ z_t = \sigma(W_z[h_{t-1}, x_t] + b_z)\\ r_t = \sigma(W_r[h_{t-1}, x_t] + b_r)\\ h_t = z_t \odot h_{t-1} + (1-z_t) \odot anh(W_h[r_t \odot h_{t-1}, x_t] + b_h) $$

Implementation

PyTorch RNN, LSTM, and GRU

▶

Example implementations of RNN, LSTM, and GRU using PyTorch:


import torch
import torch.nn as nn

class BasicRNNCell(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(BasicRNNCell, self).__init__()
        self.ih = nn.Linear(input_size, hidden_size)
        self.hh = nn.Linear(hidden_size, hidden_size)
    def forward(self, x, h):
        h = torch.tanh(self.ih(x) + self.hh(h))
        return h

class CustomLSTMCell(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(CustomLSTMCell, self).__init__()
        self.W_ii = nn.Linear(input_size, hidden_size)
        self.W_hi = nn.Linear(hidden_size, hidden_size)
        self.W_if = nn.Linear(input_size, hidden_size)
        self.W_hf = nn.Linear(hidden_size, hidden_size)
        self.W_io = nn.Linear(input_size, hidden_size)
        self.W_ho = nn.Linear(hidden_size, hidden_size)
        self.W_ig = nn.Linear(input_size, hidden_size)
        self.W_hg = nn.Linear(hidden_size, hidden_size)
    def forward(self, x, h, c):
        i = torch.sigmoid(self.W_ii(x) + self.W_hi(h))
        f = torch.sigmoid(self.W_if(x) + self.W_hf(h))
        o = torch.sigmoid(self.W_io(x) + self.W_ho(h))
        g = torch.tanh(self.W_ig(x) + self.W_hg(h))
        c = f * c + i * g
        h = o * torch.tanh(c)
        return h, c

class SequenceModel(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, cell_type='lstm', bidirectional=False, dropout=0.0):
        super(SequenceModel, self).__init__()
        self.cell_type = cell_type.lower()
        if self.cell_type == 'rnn':
            self.rnn = nn.RNN(input_size, hidden_size, num_layers, bidirectional=bidirectional, dropout=dropout)
        elif self.cell_type == 'lstm':
            self.rnn = nn.LSTM(input_size, hidden_size, num_layers, bidirectional=bidirectional, dropout=dropout)
        elif self.cell_type == 'gru':
            self.rnn = nn.GRU(input_size, hidden_size, num_layers, bidirectional=bidirectional, dropout=dropout)
        else:
            raise ValueError(f"Unknown cell type: {cell_type}")
    def forward(self, x, h=None):
        return self.rnn(x, h)

NumPy RNN from Scratch

▶

Manual implementation of a simple RNN cell in NumPy for educational purposes:


import numpy as np

def rnn_step(x_t, h_prev, Wx, Wh, b):
    return np.tanh(np.dot(x_t, Wx) + np.dot(h_prev, Wh) + b)

# Example usage
x_seq = np.random.randn(5, 3)  # 5 timesteps, 3 features
h = np.zeros((1, 4))           # hidden size 4
Wx = np.random.randn(3, 4)
Wh = np.random.randn(4, 4)
b = np.zeros((1, 4))
for t in range(5):
    h = rnn_step(x_seq[t:t+1], h, Wx, Wh, b)
    print(h)

Interview Examples

Vanishing Gradients in RNNs

Explain the vanishing gradient problem in RNNs. How do LSTM and GRU architectures address this issue?

# Key points for interview:
# - In standard RNNs, gradients are repeatedly multiplied by weights during backpropagation through time (BPTT).
# - If weights < 1, gradients shrink exponentially (vanish); if > 1, they explode.
# - LSTM/GRU introduce gating mechanisms and memory cells to preserve gradients and allow learning of long-term dependencies.

                    

Implement a Simple RNN in NumPy

Write a function to perform a forward pass of a basic RNN cell for a sequence.

import numpy as np

def rnn_forward(X, h0, Wx, Wh, b):
    h = h0
    outputs = []
    for t in range(X.shape[0]):
        h = np.tanh(np.dot(X[t], Wx) + np.dot(h, Wh) + b)
        outputs.append(h)
    return np.stack(outputs)
# Example usage:
# X = np.random.randn(10, 8)  # 10 timesteps, 8 features
# h0 = np.zeros((4,))         # hidden size 4
# Wx = np.random.randn(8, 4)
# Wh = np.random.randn(4, 4)
# b = np.zeros((4,))
# out = rnn_forward(X, h0, Wx, Wh, b)

                    

Practice Questions

1. How would you implement this in a production environment? Hard

Hint: Consider scalability and efficiency

2. Explain the core concepts of Rnns Easy

Hint: Think about the fundamental principles

3. What are the practical applications of Rnns? Medium

Hint: Consider both academic and industry use cases

Rnns

Overview

Core Concepts

Basic RNN Cell

LSTM Architecture

GRU Architecture

Implementation

PyTorch RNN, LSTM, and GRU

NumPy RNN from Scratch

Interview Examples

Vanishing Gradients in RNNs

Implement a Simple RNN in NumPy

Practice Questions

1. How would you implement this in a production environment? Hard

2. Explain the core concepts of Rnns Easy

3. What are the practical applications of Rnns? Medium

Related Resources

Related Topics