Getting started

We will be building a relatively simple file upload system that supports resumable uploads.

You can see the minimal implementation of my resumable upload on this go-http-upload.

To simplify the implementation, we will be running a single node server that accepts file uploads. By scoping the system to a single node, we can focus on the core functionality of the system without worrying about distributed systems. The reason for this is that resumable uploads are inherently more complex when distributed across multiple nodes. Think about how you would handle the state of the upload across multiple nodes, how you would handle the transfer of the file chunks between nodes, and how you would handle the resumption of the upload if the client connects to a different node?

In this lesson, let’s start by defining the API endpoint we will be using to upload files.

  1. POST /api/v3/files: Initiates a new file upload. The client will send the metadata of the file to be uploaded, and the server will respond with the location where to upload the file.
  2. HEAD /api/v3/files/{id}: Retrieves the metadata of the file by ID. This API will be used to check the status and progress of the upload. Client can use this API to see how much of the file has been uploaded and where to resume the upload.
  3. PATCH /api/v3/files/{id}: Uploads a chunk of the file. The client will send a chunk of the file to the server, and the server will append the chunk to the file.
  4. DELETE /api/v3/files/{id}: Cancels the upload of the file. The server will delete the file and its metadata.
  5. OPTIONS /api/v3/files: Retrieves the server’s capabilities. The client can query the server to determine which extensions are supported by the server.

The API endpoints above are really basics. You can modify and extend it as you see fit.

For example, in case of Google Cloud Storage, they are using this API endpoint /upload/storage/v1/b/{BUCKET_NAME}/o?uploadType=resumable&name={OBJECT_NAME}to initiate the new file upload to a GCS bucket by letting the client to define the bucket and object name on the request URL itself. In this case, probably they have capability to support for both resumable and non-resumable uploads, that’s why they have uploadType query parameter to define the upload type.

Your business requirements may vary, so feel free to adjust the API endpoints as needed.

File Metadata

Metadata is data that describes the file being uploaded. It can be anything from the file name, file size, file type, and any other information that you want to store about the file.

Some of the content of the metadata will be given by the client when initiating the upload. The server will store the user given metadata and server generated metadata along with the file so that it can be used to resume the upload later.

In our case, we will be storing the following metadata:

  • ID: A unique identifier for the file.
  • Name: The name of the file.
  • TotalSize: The total size of the file.
  • Uploaded Size: The size of the file that has been uploaded so far.
  • ContentType: The type of the file.
  • Checksum: The checksum of the file. This can be used to verify the integrity of the file once all the chunks have been uploaded.
  • Expires At: The time when the upload will expire. This can be used to automatically clean up incomplete uploads.
  • Path: The path where the file is stored.
  • Metadata: Any additional metadata that the client wants to store.

If we define the metadata as a struct in Go, it would look something like this:

type File struct {
    ID           string
    Name         string
    TotalSize    uint64
    UploadedSize uint64
    Metadata     string
    ContentType  string
    Checksum     string
    ExpiresAt    time.Time
    Path         string
}

Storage Interface

The storage interface defines the methods that we need to implement in order to store and retrieve the metadata of the file. Here we are defining the interface that we will use to interact with the storage system.

The storage interface will have the following methods:

  • Save(f File) error: Save the metadata entry.
  • Find(id string) (File, error): Get the metadata entry by file ID.

Hence, let’s define the storage interface in Go:

type Storage interface {
    Save(f File) error
    Find(id string) (File, bool, error)
}

Tus protocol does not define how the metadata should be stored, so it is up to the implementation to decide how to store the metadata. That simplest way to store the metadata is to use an in-memory map. However, in a real-world scenario, you would want to store the metadata in a persistent storage like a database. I did simple implementation by using in-memory storage, you can check it out on this go-http-upload