Multi Workers Concept

Introduction

Ever wonder how YouTube and other media platforms handle multiple video uploads? Before beginning this project, I could not fathom the concept of allowing multiple users to upload at a time and have no server errors. Since JavaScript is a single threaded programming language, it's hard to understand how it can achieve concurrency like properties to allow for multiple simultaneous uploads.

In this post, we'll talk about the solution with a short YouTube like architecture design for handling multiple uploads.

Problem Statement

We know that JavaScript is a single threaded language. A naive approach would be to create a single server with an API for the client to upload to, which the server then transcodes and saves into a database. But when two uploads come in at the same time, that server will make the second user wait until the first one finishes. If 10 users upload at the same time, the 10th user will have to wait until all pending videos are handled. There is no scaling in this architectural design.

The Solution

The solution is broken down into two parts:

Part 1: A simple server API that only handles the initial request to upload. It provides a signed URL so the upload happens directly from the client side.

Part 2: A separate server (or "worker") solely made for pulling the uploaded video from the bucket, performing video transcoding, and storing the transcoded file back into a bucket.

This concept all makes sense when we introduce containerized workers (Docker) and a message queue (Redis Streams).

The Deep Dive: Data Pipeline and Architecture for Concurrency

Step 1: The Secure Upload Handshake

Firstly, we only allow authenticated users to upload, which includes a frontend configuration that I'll be skipping for now. The user makes a request to the API to get a signed URL for a direct upload from the client into a bucket (a MinIO bucket in my case). This allows our server to stay lightweight, as the intense upload is handled directly between the client and the bucket.

// router function to send signed url into frontend
router.post('/get-signed-url', async (req,res)=>{   
  const fileName = `video-${Date.now()}.webm`;
  try{
      if(!fileName) throw new Error('filename not attached');
      const presigned_url = await minioClient.presignedPutObject(
            'firstbucket',
            fileName,
            600
        );
    return res.status(200).json({signedURL:presigned_url,uploadFileName:fileName});
  }catch(err){
        console.log('error during generating presigned url',err.message);
        res.status(500).json({error:"Server error "+ err.message});
    }
})

Step 2: The Notification

Using the same API server that was used to send the signed URL, the client then sends another request after a successful upload. This is conditionally sent from the client to the server with the name of the file in the request body. The server then pushes a message to a Redis stream for its consumers to pick up.

     await client.xAdd('upload-stream', '*', {
        fileName,
        bucket:bucketName,
        status: 'pending',
      });

Step 3: The Workers Heart of the Operation

This worker server is containerized and keeps on listening for Redis stream messages. When the server receives a message, it includes the filename to fetch and some other metadata. The worker then loads the file from the bucket with that filename and begins transcoding. After transcoding, it stores the new video and a thumbnail into two different buckets.

  
  //to listen for redis message 
  while(running){
        const uploads:any = await client.xReadGroup( 'upload-group',consumerName,{
          key:'upload-stream',id:'>',
        },
        {
          COUNT:4,BLOCK:5000,
        }
      );
      
      

handle each request for transocding with limit of 3 at time (limiting prevents cpu intensive on higher upload numbers)

  
    let limit = pLimit(3);
        
        const results = await Promise.allSettled( uploads[0].messages.map( (msg:RedisStreamMessage)=>{
          return limit(()=>transcodingVideo(msg.message.fileName ,'transcoded/{cleanUpName(msg.message.fileName)}.webm',{ stream:'upload-stream',group:'upload-group',id:msg.id } ));
        } ) )


          results.forEach( (result,index)=>{
            if(result.status =='rejected'){
              const msg = uploads[0].messages[index];
              console.log('failed to fetch on ',msg);
            }
        } )
      

Horizontal Scaling

With the combination of Redis stream consumer groups and multiple instances of a worker, I can horizontally scale the service. If one user uploads, one worker can handle it. But if 100 users upload, then I can increase the number of workers to 20 or more.

Redis Streams helps in passing the stream messages to the different workers one by one. If 5 users upload at a time and we have 5 workers, then each file message is sent to a different worker. If a 6th upload comes in, its message will wait in the stream until one of the workers finishes its current task and becomes available to process it.

Conclusion and Key Learnings

The full project for this article including the API that returns signed URLs, the worker code that consumes Redis Streams is available on GitHub: derikesh/youtube-skeleton-workers.

  1. How a lot of payload can be eliminated from the main server by using direct client to bucket uploads, saving server time and resources.
  2. The importance of decoupling a lightweight API from heavy tasks to allow for a smooth user experience.
  3. Using multiple workers with a message queue like Redis Streams (or any pub/sub service) allows us to handle intensive tasks separately from the main server, ensuring better task handling.
  4. How a child_process in Node.js can help with transcoding, allowing the main server process to work independently without being affected by a transcoding failure.

Note: The above architecture is best for learning and introducing the concept of multiple workers. It demonstrates the ground fundamentals of scaling software but is not a production grade design. It also has lots of edge cases that could be further handled, but for simply learning how the backend works, this is a good fit project.

← Back to home