DEUS .

PlantVision Disease Detection

An end-to-end deep learning system for detecting and classifying different kinds of plant diseases based on their leaves

PyTorchComputer VisionFastAPIDockerFlutterMLOps

PlantVision Disease Detection

Project Preview

The Problem & Solution

Crop diseases are a primary threat to global food security, causing an average of 40% yield loss annually. For millions of small-scale farmers and home gardeners, early and accurate disease identification is the key to effective management, but access to expert agricultural advice is often limited and expensive.

PlantVision addresses this challenge by democratizing plant pathology. It uses a highly accurate and efficient deep learning model to provide instant disease diagnosis from a simple photograph of a plant leaf. The system is designed for accessibility, with a offline-first mobile app for fieldwork and a scalable API for broader integration.


System Architecture

The project is architected as a robust, production-ready system with three core components:

  1. PyTorch Classification Model: The heart of the system is an EfficientNet model fine-tuned for high accuracy on plant leaf images.
  2. FastAPI Backend: A high-performance, containerized REST API serves the model, handling image preprocessing and inference requests.
  3. Cross-Platform Mobile App: A Flutter-based application that allows users to perform offline inference directly on their device using a quantized version of the model.

A high-level view of the system architecture. Placeholder for diagram.


Dataset & Preprocessing

The model was trained on the PlantVillage Dataset, a public benchmark containing over 54,000 images of healthy and diseased leaves across 14 plant species and 38 distinct classes. To build a model that is robust to real-world variations, a strong augmentation pipeline was crucial.

  • Geometric Augmentations: Random resized crops, flips, and rotations.
  • Color Augmentations: Adjustments to brightness, contrast, and saturation.
  • Normalization: Images were normalized using ImageNet's mean and standard deviation.

Performance Evaluation

The final model was evaluated on a held-out test set (20% of the total data).

MetricScoreDescription
[object Object]
98.7%
Overall percentage of correct predictions.
[object Object]
97.2%
Ability of the model not to label a negative sample as positive.
[object Object]
96.5%
Ability of the model to find all the positive samples.
[object Object]
96.8%
The weighted average of Precision and Recall.

Note: These are weighted averages across all 38 classes to account for moderate class imbalance in the dataset.


Mathematical Formulation

The core of the classification task is minimizing the Cross-Entropy Loss, which is ideal for multi-class problems. The function is defined as:

LCE=i=1Cyilog(y^i)L_{CE} = - \sum_{i=1}^{C} y_i \log(\hat{y}_i)

Where:

  • CC is the total number of classes (38 in our case).
  • yiy_i is the binary indicator (0 or 1) if class label ii is the correct classification.
  • y^i\hat{y}_i is the predicted probability for class ii.

Performance Metrics

Accuracy

98.7%

Precision

97.2%

Recall

96.5%

F1 Score

96.8%

Top-2 Acc

99.4%

AccuracyPrecisionRecallF1 ScoreTop-2 Acc909396100

Project Structure

Project Root

PlantVision_cv001dd/ ├── configs/ # The Recipe Book: All experiment parameters. │ ├── data_config.yaml │ ├── model_config.yaml │ └── train_config.yaml ├── data/ # All project data. │ ├── processed/ # Cleaned, transformed data for training. │ ├── raw/ # Original, immutable data. │ └── README.md # README containing information about the datasets ├── logo/ │ └── logo.png ├── mlruns/ # MLflow experimentations ├── notebooks/ # The Lab Notebook: For exploration and analysis. ├── outputs/ # Outputs from the system │ ├── best_model.pth │ ├── class_names.json │ ├── classification_report.txt │ └── confusion_matrix.png │ ├── scripts/ │ └── run_docker.sh ├── src/ # All PlantVision source code. │ ├── PlantVision/ # The core installable package for our project. │ │ ├── data/ │ │ │ ├── __init__.py │ │ │ ├── loader.py │ │ │ └── transforms.py │ │ ├── models/ │ │ │ └── efficientnet/ │ │ │ ├── __init__.py │ │ │ └── EfficientNet.py │ │ ├── __init__.py │ │ ├── evaluate.py # Entrypoint for training. │ │ ├── paths.py # Entrypoint for evaluation. │ │ ├── predict.py # Entrypoint for evaluation. │ │ ├── train.py │ └── utils.py ├── tests/ # Unit tests to verify if all components work as intended │ ├── data/ │ │ ├── __init__.py │ │ ├── test_loader.py │ │ └── test_transforms.py │ ├── models/ │ │ └── efficientnet/ │ │ │ ├── __init__.py │ │ │ └── test_efficientnet.py │ │ ├── __init__.py │ │ ├── conftest.py # Creates a simple temporary dummy project structure similar to this │ │ ├── test_evaluate.py │ │ ├── test_predict.py │ │ └── test_train.py ├── .dockerignore # Tells Docker what to ignore. ├── Dockerfile # The Shipping Container: For reproducibility. ├── LICENSE ├── pytest.ini # Suppress warnings during unit tests ├── README.md # Project explanation. ├── requirements.txt # PlantVision Project dependencies. └── setup.py # Build script for the project

Roadmap & Challenges

Challenges

  • **Domain Shift**: Ensuring the model generalizes well to images taken with different cameras and lighting not present in the training data.
  • **Class Ambiguity**: Some diseases present very similar visual symptoms, making them difficult for the model to distinguish.
  • **Model Size vs. Accuracy**: Continually balancing the trade-off between performance and resource footprint for efficient on-device inference.

Future Work

  • **Expand Dataset**: Incorporate more plant species and disease varieties, including user-submitted images with a verification pipeline.
  • **Disease Severity Estimation**: Move beyond classification to regression, estimating the severity of the disease (e.g., % leaf area affected).
  • **CI/CD for Model Retraining**: Implement a full MLOps pipeline to automatically retrain and deploy the model as new data becomes available.