Fine-Tuning Synativ's Foundation Model for Agriculture Drone Surveying

Like this post? Enter your email address to receive an API key:

Agriculture in the 21st century faces new challenges driven by population growth, tightening environmental regulations, and production costs. Optimising production becomes imperative and requires new tools such as drones to perform field surveys and enable surgical interventions. Images captured in this way need to be analysed, but doing it with humans is slow and expensive. Therefore, visual foundation models offer an attractive alternative. Synativ's VFM for agriculture drone surveying can be used as a starting point to fine-tune a model for your particular application even with a modest number of annotated images.

In this tutorial, we demonstrate how to fine-tune Synativ's foundation model for weed vs crop segmentation. We use an open dataset provided by INRAE .

When should you fine-tune Synativ's foundation model for agriculture drone surveying?

☑️ You are working on semantic segmentation of top-view field images captured by drone.

☑️ You have a small number of labelled images available for your specific application.

Setting up Synativ

Make sure that you have installed the Synativ SDK before you authenticate with your API key:

from synativ.api import Synativ

synativ_api: Synativ = Synativ(api_key="{YOUR_API_KEY}")

Preparing your data

In this tutorial, we fine-tune Synativ's foundation model for agriculture drone surveying using the INRAE crop vs weed dataset. The dataset contains images in two splits; each image has dimensions 1196x804 pixels. Split 1 contains 176 labelled images and will be used for training, while split 2 contains 124 labelled images that we divide equally between validation and test.

After the images are assigned to one of train/val/test splits, we cut each into six overlapping tiles of dimensions 512x512 pixels, giving us 1,800 patches in total: 1,056 for training and 372 for each of validation and test.

Dataset format

Before uploading your data to our cloud, your data folder should be structured in the following way:

data
    ├── train
        ├── ground_truth
            └── 000.png
            └── 001.png
            └── xxx.png
        └── inputs
            └── 000.png
            └── 001.png
            └── xxx.png
    ├── val
        ├── ground_truth
            └── 000.png
            └── 001.png
            └── xxx.png
        └── inputs
            └── 000.png
            └── 001.png
            └── xxx.png
    ├── test
        └── inputs
            └── 000.png
            └── 001.png
            └── xxx.png

The file names are allowed to be different from what is shown above (and extensions .jpg, .jpeg, and .png are accepted), but every image in train/inputs and val/inputs needs a corresponding ground_truth with the equivalent file name and extension. There is no limit to the number of samples.

The ground-truth masks should be grayscale and encoded ordinally, i.e., a pixel value of 1 indicates class 1, a value of 2 indicates class 2, and so on. 0 is reserved for the background class.

The validation set is used to determine the best model seen during training. The test set represents the images that you will run inference on. It can be updloaded at a later stage if you wish and doesn't require labels.

Uploading your data

To use proprietary data, you need to create a Synativ Dataset and give it a friendly name. It will automatically zip your data folder and upload it upon creation.

from synativ import Dataset

dataset: Dataset = synativ_api.create_dataset(
    dataset_name="your_dataset",
    dataset_dir="<path_to_your_dataset>"
)

This will return a Dataset with a few details, but most importantly a DatasetId that looks like this synativ-dataset-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyy. More info on Synativ Datasets can be found here.

Fine-tuning your model

The model comes with sensible default hyperparameters, but you can pass your own if needed (see below).

Starting your fine-tuning job

The fine-tuning API takes three arguments:

base_model: the foundation model that is fine-tuned, here base_model=synativ_agriculture.
dataset_id: the ID received when uploading the dataset.
metadata: a dictionary with fine-tuning hyperparameters. Their default values for this tutorial are the following:

metadata = {
	"num_epochs": 70,
	"learning_rate": 0.0001,
	"num_classes": 3,
	"ce_weight": [
      0.08,
      0.61, 
      0.31
	],
	"dataset_mean": [
		0.446,
		0.436,
		0.217
  	],
        "dataset_std": 
  		0.167,
  		0.182,
  		0.075 
	]
}

Please make sure that num_classes is adapted to your task. It should include the background class. The ce_weight argument refers to the class weighting in the cross-entropy loss. The default values have been calculated by taking the inverse of the number of pixels of each class in the training set, normalised to add up to 1. If you are using a different data set, you should update these weights (or set them all to 1).
Similarly, the dataset mean and standard deviation have also been calculated on the training set.

You can start fine-tuning by calling fine_tune:

from synativ import Model

model: Model = synativ_api.fine_tune(
    base_model="synativ_agriculture",
    dataset_id=dataset.id,
    metadata={}
)

This will initiate a fine-tuning job in our backend. Note that metadata is a JSON string through which the user can set hyperparameters for the particular job. If left empty, the Synativ default parameters are used.

You will receive a Model object as response:

Model(
  creation_time='2023-08-07 13:16:02.992559',
  checkpoint='',
  metadata='{<used_parameters>}',
  base_model='synativ_agriculture',
  dataset_id='synativ-dataset-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyy',
  id='synativ-model-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
)

The SDK will always return the full list of configurable hyperparameters used in metadata even if they were not overwritten by the user.

Monitoring your fine-tuning job

You can check the status of your inference job by calling get_model_status with the respective InferenceId:

synativ_api.get_inference_status(inference_id=inference.id)

This will return a Status object with one of the following:

Status(status='NOT_FOUND')          ## Wrong inference id
Status(status='QUEUED')             ## Job is queued
Status(status='SETTING_UP')         ## Job is setting up
Status(status='DOWNLOADING_DATA')   ## Downaloding data and fine-tuned model
Status(status='RUNNING_INFERENCE')  ## Inference in progress
Status(status='SAVING_RESULTS')     ## Saving inference results
Status(status='COMPLETED')          ## Inference has completed
Status(status='FAILED')             ## Inference has failed

Fine-tuning the model on this data should take approximately four hours on our default GPUs.

Runnning inference

Once the model is fine-tuned, we can run inference on the test set that was uploaded earlier.

Starting an inference job

You can start inference by calling start_inference:

inference: Inference = synativ_api.start_inference(
    model_id=model.id,
    dataset_id=dataset.id,
    metadata={}
)

This will initiate an inference job in our backend. Note that metadata is a JSON string through which the user can set hyperparamters for the particular job. If left empty, the Synativ default parameters are used.

You will receive an Inference object as response:

Inference(
    creation_time='2023-08-07 13:16:02.992559',
    metadata='{<used_parameters>}',
    model_id='synativ-model-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
    dataset_id='synativ-dataset-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyy',
    id='synativ-inference-zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzz'
)

The SDK will always return the full list of configurable hyperparameters used in metadata even if they were not overwritten by the user.

Although inference jobs generally are much faster, you can monitor them in the same way as your fine-tuning job. More info can be found here.

Downloading the results

Once the inference job is 'COMPLETED', the predictions can be downloaded by calling download_inference_results:

synativ_api.download_inference_results(
  inference_id=inference.id,
  local_dir='<path_where_you_want_to_save_your_results>'
)

Once your download is completed, you will find the results saved in <inference_id>.tar.gz in local_dir.

Results overview

The model can achieve around 80% in mIoU (mean intersection-over-union).

Here are two examples of image/ground truth/prediction from the test set. The predictions are noisier than the ground truth and don't pick up some of the smallest weeds, but overall they are quite accurate and are able to distinguish crops from weed even when they overlap.

These predictions could be used to selectively spray herbicides with a high degree of accuracy, thus reducing the economic cost and environmental impact.

Final thoughts

At Synativ, we believe that leveraging large pre-trained models fine-tuned on small annotated datasets has the potential to revolutionise the field of computer vision. This strategy sidesteps data and annotation bottlenecks, enabling quicker deployments across various domains and use cases.

If you would like to experiment yourself, you can request an API key with the button at the top of this page.

Would you like to discuss this further? Get in touch with us!.