Upload location

Location url

So far we have been discussing how to initiate an upload. The end result of the initiation is to create a new upload resource on the server. Hence, the server somehow needs to inform the client about the location of the created resource and where the client should upload the file content.

From the tus.io documentation:

The Server MUST acknowledge a successful upload creation with the 201 Created status. The Server MUST set the Location header to the URL of the created resource. This URL MAY be absolute or relative.

If we look at the sample response from the initiation upload:

HTTP/1.1 201 Created
Location: https://tus.example.org/files/24e533e02ec3bc40c387f1a0e460e216
Tus-Resumable: 1.0.0

We can see the 201 Created and the Location header. The Location header contains the URL of the created resource and the client should use this URL to upload the file content.

As you have seen earlier on Getting started, we have defined PATCH /api/v3/files/{id} endpoint to upload the file content. This endpoint is actually the URL that the server returned via the Location header. In our case, the response will look like this:

HTTP/1.1 201 Created
Location: /api/v3/files/d0c5aa97-4ef7-48db-a3e1-6b07170bf3d5

Choosing storage backend

Before we proceed with the implementation, let’s discuss where we should store the uploaded file content.

Standalone server

You may think that we can store the file content in the filesystem on the server. This is a valid approach and is commonly used. In fact, we will just use this on our implementation.

In standalone server, we can store the file content in a fixed path. For example, we can store the file content in /var/www/uploads/{id} where {id} is the unique identifier of the file. Once we resume the upload, we can easily append the file content to the existing file. Once the upload is completed, we can check its checksum with the checksum we stored earlier to ensure the integrity of the file.

However, it is important to note that storing the file content in the filesystem has some limitations. Even when you are not using resumable upload, you will have the following problems:

  • If your server crashes, you may lose the file content.
  • If the file content grows, you may need to scale the server’s storage.

Then can we bring more servers to solve this? Not really though. It is not as easy as it sounds since it is not stateless like your typical REST API. Let’s see why.

Multiple server

In resumable upload, you upload the file chunks to the same endpoint multiple times. But you can’t guarantee that the file content will be stored in the same server and get appeneded automatically. The client can upload the file chunk to server A and then upload the next chunk to server B. Hence you might end up having different file chunks stored in different servers.

Since the file chunks could be stored in different servers, you need to have a way to collect the file chunks from all servers to reconstruct the file and ensure that the file is stored in the correct order. This is not an easy task and requires a lot of effort to implement.

This is way there are a lot of companies out there provide storage services (e.g. Dropbox, Google Drive, iCloud, etc.). They have the infrastructure to handle this kind of problem.

Then what can we do to solve this problem? We can consider to use the blob storage.

Blob storage

Blob storage is the easy way to solve the problem of storing the file content. Blob storage services like Amazon S3, Google Cloud Storage, or Azure Blob Storage are designed to store large amounts of unstructured data. They are highly scalable and provide high availability and durability. You don’t have to worry about losing the file content if the server crashes.

In fact, Google Cloud Storage (GCS) or any other blob storage providers provide resumable upload API that you can use to upload the file to the blob storage. You may see GCS resumable upload API here. Hence, for each upload to your API, you can use the resumable upload API to store the file chunks. If you decide to store the file chunks to separate GCS object for each upload, you can later asynchronously use GCS compose to combine those chunks into a single file. In the other hand, if you decide to store the uploaded file to a single object in GCS, you can just use the resumable upload API to upload the file content.

Another thing to note is that GCS Go SDK already use Go io.Reader and io.Writer to handle the file upload. Given that, you can just directly use them as the arguments of your io.Copy function. By doing so, while providing your own resumable API, you can ensure that the file content is stored in a highly available and durable storage

Implementation

Since we will only be using filesystem to store the file content, during the initiation we don’t really need to create an empty file but we can just store the location of the file content on the file metadata.

Let’s modify our metadata constructor by adding the location of the file.

func NewFile() File {
    id := uuid.New().String()	
    f := File{
        // .. other fields
        Path:          "/tmp/file-upload-" + id,
    }
    return f
}

Then, let’s add these few lines at the end of the CreateUpload handler. to store the file location in the storage.

w.Header().Add("Location", fmt.Sprintf("http://127.0.0.1:8080/files/%s", fm.ID))
w.WriteHeader(http.StatusCreated)

That’s it! Now when the client receives the response, they should use the Location header to upload the file content.