Lesson 3: Neural Networks Basics
Neural networks are the foundation of modern deep learning. In this lesson, you'll understand how neurons work, explore network architectures, and learn about activation functions and backpropagation—the core algorithm that powers neural network training.
The Biological Inspiration: How Neurons Work
Artificial neural networks are inspired by biological brains. The human brain contains approximately 86 billion neurons connected by synapses. Each neuron receives signals, processes them, and transmits outputs to other neurons.
An artificial neuron mimics this behavior by:
- Receiving multiple inputs (weighted by importance)
- Summing the weighted inputs
- Applying a non-linear activation function
- Outputting the result to the next layer
Perceptrons: The Simplest Neural Network
A perceptron is the simplest form of a neural network—a single artificial neuron. It's used for binary classification tasks. The perceptron algorithm learns by adjusting weights based on misclassifications.
// Simple Perceptron Example in C#
public class Perceptron {
private double[] weights;
private double bias;
private double learningRate = 0.01;
public Perceptron(int inputSize) {
weights = new double[inputSize];
bias = 0;
// Initialize weights randomly
var random = new Random();
for (int i = 0; i < weights.Length; i++)
weights[i] = random.NextDouble() - 0.5;
}
public int Predict(double[] inputs) {
double sum = bias;
for (int i = 0; i < inputs.Length; i++)
sum += inputs[i] * weights[i];
return sum >= 0 ? 1 : 0; // Activation: Step function
}
public void Train(double[][] trainingData, int[] labels, int epochs) {
for (int epoch = 0; epoch < epochs; epoch++) {
for (int i = 0; i < trainingData.Length; i++) {
int prediction = Predict(trainingData[i]);
int error = labels[i] - prediction;
// Update weights
for (int j = 0; j < weights.Length; j++)
weights[j] += learningRate * error * trainingData[i][j];
bias += learningRate * error;
}
}
}
}
Activation Functions
Activation functions introduce non-linearity into neural networks. Without them, stacking layers would be equivalent to a single linear transformation, limiting the network's ability to learn complex patterns.
Sigmoid
Maps input to range (0,1). Commonly used in binary classification. Suffers from vanishing gradient problem.
ReLU (Rectified Linear)
Returns max(0, x). Most popular in deep learning. Computationally efficient and helps avoid vanishing gradients.
Tanh
Maps input to range (-1,1). Zero-centered output. Better than sigmoid but still prone to vanishing gradients.
Softmax
Converts outputs to probability distribution. Used in multi-class classification output layers.
Network Architecture: Layers and Connections
Neural networks consist of three types of layers:
- Input Layer: Receives raw data (not counted in layer depth)
- Hidden Layers: Process information and learn patterns (can be 1 to thousands)
- Output Layer: Produces final predictions
A network with more than one hidden layer is called a deep neural network. The depth allows the network to learn hierarchical representations—lower layers detect simple features, while deeper layers combine them into complex concepts.
// Simple Multi-Layer Network Architecture
Input Layer: 784 neurons (28×28 pixel images)
↓
Hidden Layer 1: 128 neurons (ReLU activation)
↓
Hidden Layer 2: 64 neurons (ReLU activation)
↓
Hidden Layer 3: 32 neurons (ReLU activation)
↓
Output Layer: 10 neurons (Softmax activation, for digits 0-9)
Forward Propagation: How Data Flows Through the Network
Forward propagation is the process of passing input data through the network to generate predictions:
- Input data enters the input layer
- Each neuron in the next layer receives weighted inputs from all previous neurons
- Activation function transforms the sum
- Process repeats through all hidden layers
- Output layer produces final predictions
This process is efficient and allows networks to make predictions in milliseconds, even with millions of parameters.
Backpropagation: The Training Algorithm
Backpropagation is the algorithm that trains neural networks. It works in two phases:
The name "backpropagation" comes from the fact that errors propagate backward through the network, from output layer to input layer, allowing each neuron to learn how much it contributed to the final error.
🧠 Quick Check — Lesson 3
What is the purpose of activation functions in neural networks?
🧠 Quick Check — Lesson 3
In backpropagation, which direction do gradients flow to update weights?
Lesson Summary
Artificial neurons mimic biological neurons by combining weighted inputs with bias and applying activation functions.
Perceptrons are single-neuron networks for binary classification; deeper networks learn more complex patterns.
Activation functions (ReLU, Sigmoid, Tanh) introduce non-linearity, allowing networks to learn curved decision boundaries.
Forward propagation passes data through layers; backpropagation trains the network by computing gradients.
Deep networks have multiple hidden layers that learn hierarchical features from simple to complex.