MANCompiled

Wavefunctions, weights, and while-loops - Mandeep.

Galaxy Classification with Deep Learning

July 9, 2023

๐Ÿง  Building a CNN to classify galaxy morphologies using the Galaxy10 dataset


๐ŸŒ€ Project Overview

This project implements a convolutional neural network (CNN) to classify galaxy images into 10 different morphological categories. Using the Galaxy10 dataset from DECaLS (Dark Energy Camera Legacy Survey), I trained a deep learning model to automatically identify galaxy types based on their visual characteristics.

๐ŸŒŸ Key Results Achieved 73.98% test accuracy on galaxy classification with 10 distinct morphological categories

๐Ÿ—‚๏ธ Dataset & Methodology

The Galaxy10 dataset contains over 17,000 galaxy images labeled by citizen scientists through the Galaxy Zoo project. Each image is 69ร—69 pixels and represents one of ten galaxy morphologies:

  1. Completely round smooth galaxy
  2. In-between smooth galaxy
  3. Cigar-shaped smooth galaxy
  4. Edge-on galaxy (no bulge)
  5. Edge-on galaxy (with bulge)
  6. Spiral galaxy
  7. Galaxy with bar
  8. Galaxy with no bulge
  9. Galaxy with just noticeable bulge
  10. Galaxy with obvious bulge

๐Ÿงฑ Model Architecture

The CNN architecture consists of four convolutional layers with progressively increasing filter sizes, followed by global average pooling and dense layers:


model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), 
                          activation='relu', 
                          input_shape=(69, 69, 3)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), 
                          activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(128, (3, 3), 
                          activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(256, (3, 3), 
                          activation='relu'),
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.4),
    tf.keras.layers.Dense(10, activation='softmax')
])
๐Ÿงฎ Total Parameters: ~1.2M
โฑ๏ธ Training Time: ~45 minutes
๐Ÿ“ˆ Final Accuracy: 73.98%

๐Ÿ‹๏ธ Training Process

The model was trained for 20 epochs with the following configuration:

๐Ÿ“Š Training Results

The model showed steady improvement throughout training:

๐Ÿ“‰ The model demonstrated good learning progression with validation accuracy stabilizing around 72%, indicating successful generalization without significant overfitting.

๐Ÿงช Implementation Details

๐Ÿ”„ Data Preprocessing

# Load and preprocess Galaxy10 dataset
X, y = galaxy10.load_data()
X = np.array([cv2.resize(img, (69, 69)) 
              for img in X], 
             dtype='float32') / 255.0
y_cat = tf.keras.utils.to_categorical(y, 10)

๐Ÿ–ผ๏ธ Model Testing Results

Test Image Analysis

Here's an example of the model in action on a real galaxy image:

Test Galaxy Image

Original galaxy image used for testing

๐Ÿ“‹ Prediction Results

Model Prediction Results

Model prediction output showing classification results

Model Prediction:

While the confidence is moderate, this reflects the inherent difficulty in galaxy classification, where morphological boundaries can be subtle and subjective even for human experts.

โš™๏ธ Technical Stack

Deep Learning: TensorFlow/Keras
Data Processing: NumPy, OpenCV
Dataset: astroNN Galaxy10
Environment: Google Colab

๐Ÿ’ก Key Learnings

  1. Morphological Classification Complexity: Galaxy classification is inherently challenging due to the continuous nature of morphological features and subjective classification boundaries.

  2. Data Augmentation Potential: The model could benefit from data augmentation techniques to improve generalization and handle orientation variations.

  3. Transfer Learning Opportunities: Pre-trained models could potentially improve performance, especially given the limited dataset size.

  4. Validation Strategy: The relatively stable validation accuracy suggests the model learned meaningful features without excessive overfitting.

๐Ÿš€ Future Improvements


This project was developed using Google Colab and leverages the Galaxy10 dataset, which combines high-quality DECaLS imaging with Galaxy Zoo classifications originally derived from the Sloan Digital Sky Survey (SDSS) project.