Picture of the author

Domain-Specific Foundation Models

Fine-Tuning Synativ's Pathology Foundation Model

Like this post? Enter your email address to receive an API key:


The practical effectiveness of computer vision for pathology has historically depended on having large amounts of labelled images which require expensive clinical trials to be collected.

Fine-tuning visual foundation models is a data-efficient alternative to that, however these models seldomly contain knowledge of structures in pathology images. Therefore, Synativ has trained a foundation model specifically for pathology which can be used as a starting point to fine-tune a model for your particular application with relatively few images.

Below it is visualised how Synativ's pathology foundation model (right) better focusses and classifies similar features as compared to a foundation model trained on a generic dataset (left).

image

In this tutorial, we demonstrate how to fine-tune Synativ's pathology foundation model for a breast cancer use case. We use the free and publicly available BCSS dataset.

When should you fine-tune Synativ's pathology foundation model?

☑️ You are working on a semantic segmentation for pathology data.

☑️ You have a small number of labelled pathology images available for your specific application.

Setting up Synativ

Make sure that you have installed the Synativ SDK before you authenticate with your API key:

from synativ.api import Synativ

synativ_api: Synativ = Synativ(api_key="{YOUR_API_KEY}")

Preparing your data

Dataset format

Before uploading your data to our cloud, your data folder should be structured in the following way:

data
    ├── train
        ├── ground_truth
            └── 000.png
            └── 001.png
            └── xxx.png
        └── input
            └── 000.png
            └── 001.png
            └── xxx.png
    ├── test
        ├── ground_truth
            └── 000.png
            └── 001.png
            └── xxx.png
        └── input
            └── 000.png
            └── 001.png
            └── xxx.png
    ├── val # NOTE: optional
        ├── ground_truth
        └── input

The file names are allowed to be different from what is shown above (and extensions .jpg, .jpeg, and .png are accepted), but every input (image) needs a corresponding ground_truth with the equivalent file name and extension. There is no limit to the number of samples.

The ground-truth images should be grayscale and encoded ordinally, i.e. each pixel value must be its class: 0 for class 0, 1 for class 1, etc..

Uploading your data

To use proprietary data, you need to create a Synativ Dataset and give it a friendly name. It will automatically zip your data folder and upload it upon creation.

from synativ import Dataset

dataset: Dataset = synativ_api.create_dataset(
    dataset_name="your_dataset",
    dataset_dir="<path_to_your_dataset>"
)

This will return a Dataset with a few details, but most importantly a DatasetId that looks like this synativ-dataset-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyy. More info on Synativ Datasets can be found here.

Fine-tuning your model

In this tutorial, we fine-tune Synativ's foundation model for pathology for the BCSS dataset. The model comes with sensible default hyperparameters, but you can pass your own if needed (see below).

Starting your fine-tuning job

The fine-tuning API takes three arguments:

  • base_model: the foundation model that is fine-tuned, here base_model=synativ_pathology.

  • dataset_id: the ID received when uploading the dataset.

  • metadata: a dictionary with fine-tuning hyperparameters. Their default values for this tutorial are the following:

    metadata = {
        "num_epochs": 32,
        "learning_rate": 0.0001,
        "num_classes": 6
    }
    

    Please make sure that num_classes is adapted to your task.

You can start fine-tuning by calling fine_tune:

from synativ import Model

model: Model = synativ_api.fine_tune(
    base_model="synativ_pathology",
    dataset_id=dataset.id,
    metadata={}
)

This will initiate a fine-tuning job in our backend. Note that metadata is a JSON string through which the user can set hyperparameters for the particular job. If left empty, the Synativ default parameters are used.

You will receive a Model object as response:

Model(
  creation_time='2023-08-07 13:16:02.992559',
  checkpoint='',
  metadata='{<used_parameters>}',
  base_model='synativ_pathology',
  dataset_id='synativ-dataset-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyy',
  id='synativ-model-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
)

The SDK will always return the full list of configurable hyperparameters used in metadata even if they were not overwritten by the user.

Monitoring your fine-tuning job

You can check the status of your inference job by calling get_model_status with the respective InferenceId:

synativ_api.get_inference_status(inference_id=inference.id)

This will return a Status object with one of the following:

Status(status='NOT_FOUND')          ## Wrong inference id
Status(status='QUEUED')             ## Job is queued
Status(status='SETTING_UP')         ## Job is setting up
Status(status='DOWNLOADING_DATA')   ## Downaloding data and fine-tuned model
Status(status='RUNNING_INFERENCE')  ## Inference in progress
Status(status='SAVING_RESULTS')     ## Saving inference results
Status(status='COMPLETED')          ## Inference has completed
Status(status='FAILED')             ## Inference has failed

Fine-tuning the model on the BCSS dataset should take approximately 7 hours on our default GPUs.

Evaluating your fine-tuned model

Once the model is fine-tuned, we can evaluate how well the model is performing by running inference on the test set that was uploaded earlier.

Starting an inference job

You can start inference by calling start_inference:

inference: Inference = synativ_api.start_inference(
    model_id=model.id,
    dataset_id=dataset.id,
    metadata={}
)

This will initiate an inference job in our backend. Note that metadata is a JSON string through which the user can set hyperparamters for the particular job. If left empty, the Synativ default parameters are used.

You will receive an Inference object as response:

Inference(
    creation_time='2023-08-07 13:16:02.992559',
    metadata='{<used_parameters>}',
    model_id='synativ-model-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx',
    dataset_id='synativ-dataset-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyy',
    id='synativ-inference-zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzz'
)

The SDK will always return the full list of configurable hyperparameters used in metadata even if they were not overwritten by the user.

Although inference jobs generally are much faster, you can monitor it in the same way as your fine-tuning job. More info can be found here.

Downloading the results

Once the inference job is 'COMPLETED', the predictions can be downloaded by calling download_inference_results:

synativ_api.download_inference_results(
  inference_id=inference.id,
  local_dir='<path_where_you_want_to_save_your_results>'
)

Once your download is completed, you will find the results saved in <inference_id>.tar.gz in local_dir.

The predictions match the ground-truth closely:

image

Using your fine-tuned BCSS model

You can now start hosting your fine-tuned model for real-time inference - read more here.

Let us know if you would prefer us to host your model.

Previous
Manufacturing