Galaxy Classification with Deep Learning
July 9, 2023
๐ง Building a CNN to classify galaxy morphologies using the Galaxy10 dataset
๐ Project Overview
This project implements a convolutional neural network (CNN) to classify galaxy images into 10 different morphological categories. Using the Galaxy10 dataset from DECaLS (Dark Energy Camera Legacy Survey), I trained a deep learning model to automatically identify galaxy types based on their visual characteristics.
๐๏ธ Dataset & Methodology
The Galaxy10 dataset contains over 17,000 galaxy images labeled by citizen scientists through the Galaxy Zoo project. Each image is 69ร69 pixels and represents one of ten galaxy morphologies:
- Completely round smooth galaxy
- In-between smooth galaxy
- Cigar-shaped smooth galaxy
- Edge-on galaxy (no bulge)
- Edge-on galaxy (with bulge)
- Spiral galaxy
- Galaxy with bar
- Galaxy with no bulge
- Galaxy with just noticeable bulge
- Galaxy with obvious bulge
๐งฑ Model Architecture
The CNN architecture consists of four convolutional layers with progressively increasing filter sizes, followed by global average pooling and dense layers:
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(32, (3, 3),
activation='relu',
input_shape=(69, 69, 3)),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(64, (3, 3),
activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(128, (3, 3),
activation='relu'),
tf.keras.layers.MaxPooling2D(2, 2),
tf.keras.layers.Conv2D(256, (3, 3),
activation='relu'),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.4),
tf.keras.layers.Dense(10, activation='softmax')
])
๐๏ธ Training Process
The model was trained for 20 epochs with the following configuration:
- Optimizer: Adam
- Loss Function: Categorical Crossentropy
- Batch Size: 64
- Train/Validation Split: 90/10
- Test Split: 20% of total data
๐ Training Results
The model showed steady improvement throughout training:
- Initial Training Accuracy: 76.04%
- Final Training Accuracy: 88.87%
- Best Validation Accuracy: 72.66%
- Final Test Accuracy: 73.98%
๐งช Implementation Details
๐ Data Preprocessing
# Load and preprocess Galaxy10 dataset
X, y = galaxy10.load_data()
X = np.array([cv2.resize(img, (69, 69))
for img in X],
dtype='float32') / 255.0
y_cat = tf.keras.utils.to_categorical(y, 10)
๐ผ๏ธ Model Testing Results
Test Image Analysis
Here's an example of the model in action on a real galaxy image:

Original galaxy image used for testing
๐ Prediction Results

Model prediction output showing classification results
Model Prediction:
- Predicted Class: Spiral galaxy
- Confidence: 46%
- Processing Time: <0.1 seconds
While the confidence is moderate, this reflects the inherent difficulty in galaxy classification, where morphological boundaries can be subtle and subjective even for human experts.
โ๏ธ Technical Stack
๐ก Key Learnings
-
Morphological Classification Complexity: Galaxy classification is inherently challenging due to the continuous nature of morphological features and subjective classification boundaries.
-
Data Augmentation Potential: The model could benefit from data augmentation techniques to improve generalization and handle orientation variations.
-
Transfer Learning Opportunities: Pre-trained models could potentially improve performance, especially given the limited dataset size.
-
Validation Strategy: The relatively stable validation accuracy suggests the model learned meaningful features without excessive overfitting.
๐ Future Improvements
- Data Augmentation: Implement rotation, scaling, and brightness variations
- Transfer Learning: Experiment with pre-trained CNN backbones
- Ensemble Methods: Combine multiple models for improved accuracy
- Attention Mechanisms: Incorporate attention layers to focus on relevant morphological features
This project was developed using Google Colab and leverages the Galaxy10 dataset, which combines high-quality DECaLS imaging with Galaxy Zoo classifications originally derived from the Sloan Digital Sky Survey (SDSS) project.