Picture of the author

Domain-Specific Foundation Models

Fine-Tuning Synativ's Geospatial Foundation Model

Like this post? Enter your email address to receive an API key:

See a short demo of Synativ Geospatial in action:

Geospatial data is particularly challenging due to the amount of data produced, the variety of formats (e.g., the number of bands), and the varying resolutions at which it is collected. Traditional models require large amounts of labelled data, but annotating geospatial images is slow, expensive, and requires domain experts.

In general, fine-tuning visual foundation models is a data-efficient alternative to training models from scratch. However, these models are seldom trained on enough images from a geospatial perspective to generalize well to these use cases. Therefore, Synativ has released a foundation model specifically for geospatial applications which can be used as a starting point to fine-tune a model for your particular application with relatively few images. Unlike common image segmentation models, ours is able to process up to 6 bands and take into account the time dimension, so that you can fine-tune it on data collected across several seasons or years.

In this tutorial, we demonstrate how to fine-tune Synativ's geospatial foundation model for a flood detection use case. We use the free and publicly available Sen1Floods11 dataset.

When should you fine-tune Synativ's geospatial foundation model?

☑️ You are working on semantic segmentation for geospatial applications.

☑️ You have a small number of labelled satellite images available for your specific application.

If you have a lot of proprietary data, Synativ can also help you to train your own foundation model.

Setting up Synativ

Make sure that you have installed the Synativ SDK before you authenticate with your API key:

from synativ.api import Synativ

synativ_api: Synativ = Synativ(api_key="{YOUR_API_KEY}")

Preparing your data

Dataset format

Before uploading your data to our cloud, your data folder should be structured in the following way:

    ├── train
        ├── ground_truth
            └── 000.tif
            └── 001.tif
            └── xxx.tif
        └── input
            └── 000.tif
            └── 001.tif
            └── xxx.tif
    ├── val
        ├── ground_truth
            └── 000.tif
            └── 001.tif
            └── xxx.tif
        └── input
            └── 000.tif
            └── 001.tif
            └── xxx.tif
    └── test
        ├── ground_truth
            └── 000.tif
            └── 001.tif
            └── xxx.tif
        └── input
            └── 000.tif
            └── 001.tif
            └── xxx.tif
  • The images need to be provided as .tif and must have 13 bands as per Sentinel / HLS (Harmonized Landsat Sentinel) standard. Any "no data" pixel in images should be set to -9999. The model will automatically extract bands 2, 3, 4, 9, 12 and 13.
  • The file names are allowed to be different from what is shown above, but every input (Timage) needs a corresponding ground_truth with the equivalent file name and extension. There is no limit to the number of samples.
  • The labels also need to be supplied as .tif but with a single channel. Any "no data" pixel in the labels should be set to -1.

The original model has been pre-trained on tiles of size 512x512 px. If your image size differs from that, the model will resize them when consuming the data.

Uploading your data

To use proprietary data, you need to create a Synativ Dataset and give it a friendly name. It will automatically zip your data folder and upload it upon creation.

from synativ import Dataset

dataset: Dataset = synativ_api.create_dataset(

This will return a Dataset with a few details, but most importantly a DatasetId that looks like this synativ-dataset-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyy. More info on Synativ Datasets can be found here.

Fine-tuning your model

In this tutorial, we fine-tune on the Sen1Floods11 dataset, with approximately 250 training images. The model comes with sensible default hyperparameters for this task, but you can pass your own if needed (see below).

Starting your fine-tuning job

The fine-tuning API takes three arguments:

  • base_model: the foundation model that is fine-tuned, here base_model=synativ_geospatial.

  • dataset_id: the ID received when uploading the dataset.

  • metadata: a dictionary with fine-tuning hyperparameters. Their default values for this tutorial are the following:

    metadata = {
        "num_classes": 2,
        "CLASSES": "[0, 1]",
        "num_frames": 1,
        "img_norm_cfg": {
        "means": [
        "stds": [
      "ignore_index": 2,
      "epochs": 100,            # training only
      "lr": 1.5e-5,             # training only
      "weight_decay": 0.05,     # training only
      "frozen_backbone": false  # training only
  • If you are performing binary segmentation, you should set num_classes = 2 and CLASSES = [0, 1]. Moreover, the background class should be included in CLASSES. You should always ensure that num_classes and CLASSES are consistent with the provided labels you provide and each other.

  • num_frames corresponds to the number of observations per location over time. For tasks that have a temporal dimension (multiple frames at different times over weeks, months, or years), this value allows you to define how many frames to use per sample.

  • As explained in "Preparing your data", any areas to ignore in the labels should receive value -1. Under the hood, this value is converted to ignore_index (default 2 for binary segmentation) for training and inference. If you have more than two classes, you need to increment ignore_index accordingly. You can also use ignore_index to ignore a specific class during training and evaluation.

  • img_norm_cfg should be structured like a Python dict with two keys: means and stds (plural in both cases). The values are lists of length 6, corresponding to the mean and standard deviation of each of the bands used by the model, namely 2, 3, 4, 9, 12 and 13. The values above are the default values for the Sen1flood11 dataset . Unlike in natural image segmentation, we cannot presume that the statistics will be similar to ImageNet's, therefore you should compute them for each of those six bands on your training dataset, and update the hyperparameters.

    We plan to include the ability to automatically compute the means and standard deviations by channel of your dataset in the future.

  • frozen_backbone is a boolean that determines whether the ViT backbone should be frozen during training. If false (default), it is advised to use a relatively large value for weight_decay (default 0.05). Conversely, if true, you can reduce weight_decay (e.g., 0.001).

You can start fine-tuning by calling fine_tune:

from synativ import Model

model: Model = synativ_api.fine_tune(

This will initiate a fine-tuning job in our backend. Note that metadata is a JSON string through which the user can set hyperparameters for the particular job. If left empty, the Synativ default parameters are used.

You will receive a Model object as response:

  creation_time='2023-08-07 13:16:02.992559',

The SDK will always return the full list of configurable hyperparameters used in metadata even if they were not overwritten by the user.

Monitoring your fine-tuning job

You can check the status of your inference job by calling get_model_status with the respective InferenceId:


This will return a Status object with one of the following:

Status(status='NOT_FOUND')          ## Wrong inference id
Status(status='QUEUED')             ## Job is queued
Status(status='SETTING_UP')         ## Job is setting up
Status(status='DOWNLOADING_DATA')   ## Downaloding data and fine-tuned model
Status(status='RUNNING_INFERENCE')  ## Inference in progress
Status(status='SAVING_RESULTS')     ## Saving inference results
Status(status='COMPLETED')          ## Inference has completed
Status(status='FAILED')             ## Inference has failed

Fine-tuning the model on the Sen1Floods11 dataset should take approximately two hours on our default GPUs for the default 100 epochs.

Evaluating your fine-tuned model

Once the model is fine-tuned, we can evaluate how well the model performs by running inference on the test set that was uploaded earlier.

Starting an inference job

You can start inference by calling start_inference:

inference: Inference = synativ_api.start_inference(

This will initiate an inference job in our backend. Note that metadata is a JSON string through which the user can set hyperparamters for the particular job. If left empty, the Synativ default parameters are used.

You will receive an Inference object as response:

    creation_time='2023-08-07 13:16:02.992559',

The SDK will always return the full list of configurable hyperparameters used in metadata even if they were not overwritten by the user.

If you changed the hyperparameters for training, you need to ensure that you are passing consistent hyperparameters for inference. Please refer to the reference list in "Starting your fine-tuning job". Note that any hyperparameter marked as "Training only" is irrelevant for inference and should not be passed or it will cause an error.

Although inference jobs generally are much faster, you can monitor them in the same way as fine-tuning jobs. More info can be found here.

Downloading the results

Once the inference job is 'COMPLETED', the predictions can be downloaded by calling download_inference_results:


Once your download is completed, you will find the results saved in <inference_id>.tar.gz in local_dir.

Results overview

On this task with the default hyperparameters, the model achieves a mean Intersection over Union (mIoU) of around 90% and a mean accuracy over both classes of over 95%.

This example from the test set shows that its predictions match the ground truth closely:


Using your fine-tuned geospatial model

You can now start hosting your fine-tuned model for real-time inference - read more here.

Let us know if you would prefer us to host your model.