video - freeCodeCamp.org

The WebCodecs Handbook: Native Video Processing in the Browser

Sam Bhattacharyya — Wed, 08 Apr 2026 20:35:45 +0000

If you've ever tried to process video in the browser, like for a video editing or streaming app, your options were either to process video on a server (expensive) or to use ffmpeg.js (clunky). With the WebCodecs API, there's now a better way to do this.

WebCodecs is a relatively new API that allows browser applications to process video efficiently with very low-level control.

In the past, if you wanted to build, say, a video-editing app or live-streaming studio or anything that required 'heavy lifting', you needed to build a native desktop application. Many SaaS tools like Canva got around this with server-side video processing, which provided a much better UX, but which is much more complex and expensive.

With WebCodecs, it's now possible to build these apps entirely in the browser, without requiring users to download and install software, and without expensive, complex server infrastructure.

This isn't theoretical. Video Editing tools like Capcut saw an 83% boost in traffic after switching to WebCodecs + WebAssembly [1]. Utility apps like Remotion Convert and Free AI Video Upscaler (both open source) process thousands of videos a day with zero server costs and no installation required [2].

WebCodecs is even being used for entirely new use cases, like generating videos programatically [3].

If you're building any kind of video app, it's worthwhile to at least know about WebCodecs as an option for working with video in the browser.

In this guide, we will:

Review the basics of Video Processing
Introduce the WebCodecs API
Discuss Muxing + Demuxing to read and write video files
Build our own video conversion utility to convert videos between webm + mp4, and apply basic transformations
Cover some production-level concerns
Discuss additional resources

The goal of this article is to be a practical entry point and introduction to the WebCodecs API for frontend developers. It'll teach you how the API works and what you can do with it. I'll assume you know the basics of Javascript but you don't need to be a senior developer or a video engineer to follow along.

At the end, I'll mention additional learning resources and references. In future tutorials, I'll go more in-depth on specific topics like building a video editor, or doing live-streaming with WebCodecs. But this handbook should provide a solid starting point for what WebCodecs is, what it can do, and how to build a basic application with it.

Prerequisites
Primer on Video Processing
What is WebCodecs?
- Before WebCodecs
- Core API
Muxing and Demuxing
- Demuxing
- Muxing
Building a Video Converter Utility
Production Concerns
- Codecs
- Bit rate
- GPU vs CPU
- Memory
Further Resources

Prerequisites

You don't need to be a video engineer to follow along, but you should be comfortable with:

Core JavaScript, including async/await and callbacks
Basic browser APIs like fetch and the DOM
What a File object is and how file inputs work in HTML
A general sense of what HTML5 is (we'll use it briefly, but won't go deep)

No prior knowledge of video processing, codecs, or media APIs is required — that's what the first half of this handbook covers.

Primer on Video Processing

Hold your bunnies, because before getting into WebCodecs, I want to make sure you're aware of what codecs are before we even consider putting codecs on the web.

Video Frames

I presume you know what a video is. Ironically the 'video' below is actually a gif, but you get the idea.

Videos are just a series of images, shown one after the other, in quick succession. Each image is called a Video Frame, and each frame is associated with a timestamp. When a video player plays back the video, it displays each video frame at the time indicated by the timestamp.

Every frame in the video is made of pixels, with a 4K video frame containing approximately 8 million pixels (3840*2160 = 8294400).

Each pixel itself is actually made of 3 components: a Red, Green, and Blue value (also called RGB value).

Each of of the R, G and B color values is stored as an 8-bit integer, ranging from 0 to 255, with the number indicating the intensity of the red, green, or blue color component.

Combining the intensity of each of the R, G, and B components lets you represent any arbitrary color on the color spectrum:

So for each pixel, we need 3 bytes of data: 1 byte for each of the R, G, and B color values (1 byte = 8 bits). A 4K video frame therefore would contain ~25 Megabytes of data.

At 30 frames per second (a typical frame rate), a 1 hour, 4K video would be around 746 Gigabytes of data. If you've ever downloaded a large video or recorded HD video with your phone camera, you'll know that video files can be large, but they're never that large.

In reality, actual video files you might watch on YouTube, record on your phone camera, or download from the internet are ~100x smaller than that. The reason actual video files are much smaller is because of video compression, a family of very sophisticated algorithms that help reduce the data by ~100x.

Without this video compression, you wouldn't be able to record more than 10 minutes of video on the latest high-end smartphones, and you wouldn't be able to stream anything HD on a high-end home internet connection.

As sophisticated as our modern devices and internet connections are, without aggressive video compression, we wouldn't be able to watch, record, or stream anything in HD.

Codecs

A codec is a fancy word for a video compression algorithm. There are a few established codecs / compression algorithms, such as:

h264: The most common codec. If you see an mp4 file, it most likely uses the h264 codec.
vp9: An open source codec used commonly by YouTube and in video conferencing, often found in webm files.
av1: A new open source codec, increasingly being used by platforms like YouTube and Netflix.

How these algorithms work is too complex and out of scope for this handbook. But at a very high level, here are some major ways these algorithms compress video:

Removing detail

All these algorithms use a technique called the Discrete Cosine Transform to "remove details". As you remove "detail" from the video frame, the frame starts looking "blockier". This technique is so effective, though, that you can compress a video frame by ~10x before the differences start becoming visible to the human eye.

For the curious, you can see this video by Computerphile on how the DCT algorithm works.

Encoding frame differences

When you actually look at a sequence of video frames, you'll notice that visually they're quite similar, with only small portions of the video changing, depending on how much movement there is.

These codecs/compression algorithms use sophisticated math and computer vision techniques to encode just the differences between frames,.

You therefore only need to send the first frame (a Key Frame) – then for subsequent frames you can send the "frame differences", also called Delta Frames, to reconstruct the each full frame.

In practice, for an hour long video, we don't just encode the first frame and store millions of delta frames. Instead, algorithms encode every 60th frame or so as a Key Frame, and then the next 59 frames are delta frames.

This technique is also highly effective, reducing data used by another ~10x. The distinction between Key Frames and Delta Frames is one of the few bits of "how these algorithms work" that you actually need to be aware of.

There's a number of other details and compression techniques that go into these compression algorithms that are out of scope for an intro article.

Encoding & Decoding

For video compression to work, we need to be able to both compress video (turn raw video into compressed binary data) and then decompress video (turn the compressed binary data back into raw video frames).

Turning raw video frames into compressed binary data is called encoding, and turning compressed binary data back into raw video frames is called decoding. The word codec is just an abbreviation for "encode decode".

From a practical, developer perspective, you don't need to know how these codecs work, but you do need to know that:

There are different video codecs, like h264, vp9, and av1
When you encode a video with a codec (like h264), you need a video player that can support the same codec to play back the video.
Encoding video takes a lot more computation than decoding video, so playing 4K video on a low-end phone is fine, but encoding 4K video on it would be super slow.
Most consumer devices (phones, laptops) have specialized chips designed specifically for encoding and decoding video, making encoding/decoding much faster than if run on the CPU like a normal software program. This is called hardware acceleration.

In practice, there are only a handful of video codecs, because the entire world needs to agree on standards, so that video recorded on an iPhone can be played back on a windows device.

Containers

Most people haven't heard of h264 or vp9. When you think of video files, you typically think of file formats like MP4 or MKV. These are also relevant, but they're a separate thing called containers.

A video file typically has encoded audio, encoded video, and metadata about the video file. A file format like MP4 describes a specific format for storing the encoded audio and video data, as well as the metadata.

Video compression software stores the encoded audio/video and metadata into a file according to the file format / specs. This is called muxing.

Likewise, video players follow the file format specs to read the metadata and find the encoded audio/video. This is called demuxing.

When compressing a video file, you need to both encode it and mux it (in that order). These are two separate stages of the process. Likewise, when playing a video file, you need to both demux it and then decode it (in that order).

When a video player opens, say, an mp4 file, the logic flow is as follows:

Ok, the file ends in .mp4, so it must be an mp4 file. Let me load the library for parsing mp4 files, and parse then parse file.
Great, I've parsed the mp4 file, I now have the metadata and know where in the byte offsets are to fetch the encoded audio and video.
I'll start fetching the first encoded video frames, decode them, and start displaying the decoded video frame to the user.

If you ever see a "video file is corrupt" message from a video player, it's likely that the video file doesn't follow the file format spec and there was an error while trying the parse / demux the video.

What is WebCodecs?

Now that we've covered codecs, let's put them on the Web.

WebCodecs is an API that allows frontend developers to encode and decode video in the browser efficiently (using hardware acceleration), and with very low level control (encode/decode on a frame by frame basis).

The hardware acceleration bit is important, as you can't just poly fill or re-implement the API yourself. WebCodecs gives direct access to specialized hardware for encoding/decoding, making it as performant as a desktop video app.

Before WebCodecs

It's worth taking a moment to understand why WebCodecs exists. Before the WebCodecs API existed, there were several alternatives you could use for video operations in the browser.

HTMLVideoElement: You can still create a element and use it for decoding a video. It's easy to use, but you lack frame level control. Your only control is setting the 'video.currrentTime' property and waiting for it to seek, often leading to dropped/missing frames.
Media Recorder API: Essentially allows you to 'screen record' any canvas element or video stream. While it works, it's functionally equivalent to screen recording Adobe Premeire pro instead of clicking render. For editing scenarios, you lose frame level control and can only process video at real-time speed.
FFMPEG.js: A port of the popular video processing tool ffmpeg, which runs ffmpeg in the browser. Many tools used this in the past, but it lacks hardware acceleration, making it much slower than WebCodecs. It also has file size restrictions stemming from the fact that it runs in WebAssembly, making it difficult to work with videos that are larger than 100 MB.

WebCodecs was built and released in 2021 to enable low-level, hardware accelerated video decoding and encoding. It's great for high-performance streaming and video editing, which were use cases not well-served by the existing APIs.

Core API

The core API for WebCodecs consists of two new "data types", the VideoFrame and EncodedVideoChunk, as well as the VideoEncoder and VideoDecoder interfaces.

VideoFrame

The Javascript VideoFrame object conceptually contains both pixel data and metadata about the video frame.

You can actually create a new VideoFrame object from any image source, as long as you include the metadata:

const bitmapFrame = new VideoFrame(imgBitmap, {timestamp: 0});

const imageFrame = new VideoFrame(htmlImageEl, {timestamp: 0});

const videoFrame = new VideoFrame(htmlVideoEl, {timestamp: 0});

const canvasFrame = new VideoFrame(canvasEl, {timestamp: 0});

For a video editing app, for example, you would typically perform image editing operations on each frame on a canvas, and then you would grab each VideoFrame from the canvas.

You can also draw a VideoFrame to a canvas using the Canvas 2D rendering context:

ctx.drawImage(frame, 0, 0);

You would typically do this when rendering / playing back a video in the browser.

EncodedVideoChunk

An EncodedVideoChunk is just the compressed version of a VideoFrame, containing the binary data as well as the same metadata as the frame.

You would typically get EncodedVideoChunks from a library which extracts them from a File object.

import { getVideoChunks } from 'webcodecs-utils'

const chunks =  await getVideoChunks( file);

Alternatively, it's the output you get from a VideoEncoder object.

There's not much useful stuff you can do with EncodedVideoChunks – it's just the binary data that you read from files, write to files, or stream over the internet.

The value in EncodedVideoChunk is that it's ~100x smaller than raw video data, which is why you'd send EncodedVideoChunks instead of raw video when streaming (and writing to a file).

VideoEncoder

A VideoEncoder turns VideoFrame objects into EncodedVideoChunk objects.

The core API looks something like this, where you define the callback where the VideoEncoder returns EncodedVideoChunk objects.

const encoder = new VideoEncoder({
    output: function(chunk: EncodedVideoChunk, meta: any){
        // Do something with the chunk
    },
    error: function(e: any)=> console.warn(e);
});

Keep in mind that this is an async process, and not even a typical async process. You can't just treat this as a per-frame operation.

// Does not work like this
const frame  = await encoder.encode(chunk);

This is because of how video encoding actually works under the hood. So you have to accept that the outputs are returned via callback, and you get the outputs when you get them.

Once you define your encoder, you can then configure the VideoEncoder with your choice of codec (we'll get to this), as well as other parameters like width, height, framerate and bitrate.

encoder.configure({
    'codec': 'vp9.00.10.08.00', // We'll get to this
     width: 1280,
     height: 720,
     bitrate: 1000000 //1 MBPS,
     framerate: 25
});

You can then start encoding frames. Here we assume we already have VideoFrame objects, and we make every 60th frame a Key Frame.

for (let i=0; i < frames.length; frames++){
    encoder.encode(frames[i], {keyFrame: i%60 ==0})
}

VideoDecoder

The Video Decoder does the reverse, turning EncodedVideoChunk objects into VideoFrame objects.

Here's a simplified example of how to set up the VideoDecoder. First, extract the EncodedVideoChunk objects and the decoder config from the video file. Here, we don't choose the config – the config was chosen by whoever encoded the file. When decoding, we extract the config from the file.

import { demuxVideo } from 'webcodecs-utils';

const {chunks, config} = await demuxVideo( file);

Next, we set up the VideoDecoder by specifying the callback when VideoFrame objects are generated, and we configure it with the config.

const decoder = new VideoDecoder({
    output: function(frame: VideoFrame){
        //do something with the VideoFrame
    },
    error: function(e: any)=> console.warn(e);
});

decoder.configure(config)

Again, like with VideoEncoder, it returns frames in a callback. Finally we can start decoding chunks.

for (const chunk of chunks){
    decoder.decode(chunk);
}

Putting it all together

At its core, the WebCodecs API is just the two data types (EncodedVideoChunk, VideoFrame) and the VideoEncoder and VideoDecoder interfaces which convert between the two data types.

Keep in mind that the WebCodecs API doesn't actually work with video files. It only applies the encoding and decoding, and EncodedVideoChunk objects just represent binary data.

Reading video files and writing video files are their own, separate thing called muxing/demuxing.

Muxing and Demuxing

To write to a video file, you'll also need to mux the video. And to play a video file, you need to demux the video. This involves following the file format of the video container, parsing the video file (in the case of demuxing), or placing encoded video data in the right place in the file you are writing to (muxing).

Muxing and Demuxing are not included in the WebCodecs API, so you'll need to use a separate library to handle muxing and demuxing.

Demuxing

To play a video back in the browser, we need to both demux the video and decode the video, in that order.

There are several libraries you can use to demux videos, including MediaBunny or web-demuxer. For the purposes of this tutorial, I put a very simplified wrapper around these libraries and exposed it in the webcodecs-utils package, so that demuxing is a very simple 2-liner:

import { demuxVideo } from 'webcodecs-utils'
const {chunks, config} = await demuxVideo(file);

This reads the entire video into memory, so don't do this in practice. But it's helpful in making a simple, readable hello world for WebCodecs.

The following snippet will take in a video file (File object), decode it, and paint the result to a canvas. Here, we get the frames from the output callback, and run the draw calls directly from the callback.

import { demuxVideo } from 'webcodecs-utils'

async function playFile(file: File){

    const {chunks, config} = await demuxVideo(file);
    const canvas = document.createElement('canvas');
    const ctx = canvas.getContext('2d');

    const decoder = new VideoDecoder({
        output(frame: VideoFrame) {
            ctx.drawImage(frame, 0, 0);
            frame.close()
        },
        error(e) {}
    });


    decoder.configure(config);

    for (const chunk of chunks){
        decoder.decode(chunk)
    }

}

Here's our super barebones demo for playing back an actual video:

For a more 'correct' demuxing example, here is what demuxing looks like with MediaBunny, where you can extract chunks in an iterative fashion.

import { EncodedPacketSink, Input, ALL_FORMATS, BlobSource } from 'mediabunny';

const input = new Input({
  formats: ALL_FORMATS,
  source: new BlobSource( file),
});

const videoTrack = await input.getPrimaryVideoTrack();
const sink = new EncodedPacketSink(videoTrack);

for await (const packet of sink.packets()) {
  const chunk =  packet.toEncodedVideoChunk();
}

Muxing

To write a video file, you not only need to encode it (with the VideoEncoder) you also need to mux it. This involves taking the encoded chunks and placing them in the right place in the output binary file that you're writing to.

Again, you need a library to mux videos ( MediaBunny), but for demo purposes I created a super simple wrapper. Here we define a super basic ExampleMuxer.

import { ExampleMuxer } from 'webcodecs-utils'

const muxer = new ExampleMuxer('video');

for (const chunk of encodedChunks){
    muxer.addChunk(chunk);
}

const outputBlob = await muxer.finish();

As a full encoding + muxing demo, we'll create an encoder, and we'll set it to mux the output encoded chunks as soon as they are returned.

const encoder = new VideoEncoder({
    output: function(chunk, meta){
        muxer.addChunk(chunk, meta);
    },
    error: function(e){}
})

encoder.configure({
    'codec': 'avc1.4d0034', // We'll get to this
     width: 1280,
     height: 720,
     bitrate: 1000000 //1 MBPS,
     framerate: 25
});

We'll then define a canvas animation, which will draw the current frame number to the screen, just to prove it's working.

const canvas = new OffscreenCanvas(640, 360);
const ctx = canvas.getContext('2d');
const TOTAL_FRAMES=300;
let frameNumber = 0;
let chunksMuxed = 0;
const fps = 30;


function renderFrame(){
    ctx.fillStyle = '#000';
    ctx.fillRect(0, 0, canvas.width, canvas.height);
    ctx.fillStyle = 'white';
    ctx.font = `bold ${Math.min(canvas.width / 10, 72)}px Arial`;
    ctx.textAlign = 'center';
    ctx.textBaseline = 'middle';
    ctx.fillText(`Frame ${frameNumber}`, canvas.width / 2, canvas.height / 2);
}

Finally we'll create the encode loop, which will draw the current frame, and then encode it.


let flushed = false;

async function encodeLoop(){

    renderFrame();

    const frame = new VideoFrame(canvas, {timestamp: frameNumber/fps*1e6});
    encoder.encode(frame, {keyFrame: frameNumber %60 ===0});
    frame.close();

    frameNumber++;

    if(frameNumber === TOTAL_FRAMES) {
        if (!flushed) encoder.flush();
    }
    else return requestAnimationFrame(encodeLoop);
}

Putting it all together, you can encode the canvas animation to a video file with frame-level accuracy.

You can download the video and use any video inspection tool to verify that every single frame number is included.

This is one of the critical distinctions that separates this from other web APIs like MediaRecorder which can also encode video, but has no frame-level accuracy. WebCodecs makes sure that you can control and guarantee the consistency of each frame.

Finally, a proper full, muxing example using MediaBunny would look like this:

import {
  EncodedPacket,
  EncodedVideoPacketSource,
  BufferTarget,
  Mp4OutputFormat,
  Output
} from 'mediabunny';

async function muxChunks(chunks: EncodedVideoChunk[]): Promise {

    const output = new Output({
        format: new Mp4OutputFormat(),
        target: new BufferTarget(),
    });

    const source = new EncodedVideoPacketSource('avc');
    output.addVideoTrack(source);

    await output.start();

    for (const chunk of chunks){
        source.add(EncodedPacket.fromEncodedChunk(chunk))
    }

    await output.finalize();
    const buffer =  output.target.buffer;
    return new Blob([buffer], { type: 'video/mp4' });

});

Building a Video Converter Utility

Now that we've covered the basics of WebCodecs as well as Muxing, we'll move towards actually building an MVP of something useful: a video converter utility. We'll be able to use it to convert between mp4 and webm, and do some basic operations like resizing and flipping the video.

Transcoding

Before we do resizing and flipping, let's first handle a basic conversion decoding a video, and encoding the video to a new format. This is called transcoding.

To transcode video, we need to set up a pipeline with the following processes:

Demuxing: Read EncodedVideoChunks from a video file
Decoding: Convert EncodedVideoChunks to VideoFrames
Encoding: Convert VideoFrames to new EncodedVideoChunks
Muxing: Write the EncodedVideoChunks to a new video file

Our pipeline looks something like this:

Using everything we've covered in this article up until now, we could build a full working demo with just VideoEncoder and VideoDecoder as discussed. But then state management and tracking frames becomes complicated and error prone.

We're going to add one more abstraction, using the Streams API, which will make our pipeline look like the below. It ties directly to our mental model of our pipeline and simplifies a ton of details like state management.

const transcodePipeline = demuxerReader
    .pipeThrough(new VideoDecoderStream(videoDecoderConfig))
    .pipeThrough(new VideoEncoderStream(videoEncoderConfig))
    .pipeTo(createMuxerWriter(muxer));

await transcodePipeline;

To do this, we'll create a TransformStream for the VideoDecoder and VideoEncoder.

class VideoDecoderStream extends TransformStream<{ chunk: EncodedVideoChunk; index: number }, { frame: VideoFrame; index: number }> {
  constructor(config: VideoDecoderConfig) {
    let pendingIndices: number[] = [];
    super(
      {
        start(controller) {
          decoder = new VideoDecoder({
            output: (frame) => {
              const index = pendingIndices.shift()!;
              controller.enqueue({ frame, index });
            },
            error: (e) => controller.error(e),
          });

          decoder.configure(config);
        },

        async transform(item, controller) {
          pendingIndices.push(item.index);
          decoder.decode(item.chunk);
        },

        async flush(controller) {
          await decoder.flush();
          if decoder.state !== 'closed' decoder.close();
        },
      }
    );
  }
}

I won't bore you with the full code, but I've packaged these utilities in the webcodecs-utils package, which can be used as such:

import {
  SimpleDemuxer,
  VideoDecodeStream,
  VideoEncodeStream,
  SimpleMuxer,
} from "webcodecs-utils";

Our code for transcoding a file then becomes this:

const demuxer = new SimpleDemuxer(videoFile);
await demuxer.load();
const decoderConfig = await demuxer.getVideoDecoderConfig();

const encoderConfig = {/*Whatever we decide*/};

// Set up muxer
const muxer = new SimpleMuxer({ video: "avc" });

// Build the upscaling pipeline
await demuxer.videoStream()
  .pipeThrough(new VideoDecodeStream(decoderConfig))
  .pipeThrough(new VideoEncodeStream(encoderConfig))
  .pipeTo(muxer.videoSink());

// Get output
const blob = await muxer.finalize();

For this intermediate demo, just to actually get transcoding to work, we'll download a pre-built file, and we'll introduce a toggle to output an mp4 file (using h264) or a webm file (using vp9).

We'll use avc1.4d0034 for h264 (most widely supported h264 codec string) and vp09.00.40.08.00 for vp9 (most widely supported vp9 string).

Here's a basic transcoding demo on CodePen:

Transformations

If we want to do any kind of transformations to the video, like flips, crops, rotations, resizing, and so on, we can't just work with pure VideoFrame objects.

The simplest way to accomplish this would be to introduce a Canvas element, where we'll use a 2d Canvas Context to manipulate our source frame and draw that to a canvas.

const canvas = new OffscreenCanvas(width, height);
const ctx = canvas.getContext('2d');

// Very easy to do transformations
ctx.drawImage(sourceFrame, 0, 0);

We'll then use the Canvas as a source image for our output video frame.

const outFrame = new VideoFrame(canvas, {timestamp: sourceFrame.timestamp});

To apply a resize operation, we'll first set the canvas dimensions to our output height and width.

const canvas = new OffscreenCanvas(outputWidth, outputHeight);
const ctx = canvas.getContext('2d');

// Resize sourceFrame to fit output dimensions
ctx.drawImage(sourceFrame, 0, 0, outputWidth, outputHeight);

To apply a horizontal flip operation with canvas2d, we can do the following:

ctx.scale(-1, 1);
ctx.translate(-outputWidth, 0);
ctx.drawImage(sourceFrame, 0, 0, outputWidth, outputHeight);

You can create a full render function that applies these transformations which looks like this:

function render(videoFrame, outW, outH, flipped) {

  canvas.width  = outW;
  canvas.height = outH;

  if (flipped) {
    ctx.scale(-1, 1);
    ctx.translate(-outW, 0);
  }
  ctx.drawImage(videoFrame, 0, 0, outW, outH);

}

Here's an interactive demo of what these transformations look like:

Transform Pipeline

With these transformations, we need to adjust our pipeline to include a transformation step. It will take in a VideoFrame, apply the transforms, and return a transformed frame.

In the webcodecs-utils package, there is a VideoProcessStream object for this purpose, which takes in an async function which takes in a VideoFrame and returns a VideoFrame:

import { VideoProcessStream} from "webcodecs-utils";
 
new VideoProcessStream(async (frame) => {
      // Apply transformations
      return procesedFrame;
    }),

So to apply our transformations, we can set it up as so:

import { VideoProcessStream} from "webcodecs-utils";
 

const canvas = new OffscreenCanvas(outW, outH);
const ctx = canvas.getContext('2d');

const processStream = new VideoProcessStream(async (frame) => {
  
  if (flipped) {
    ctx.scale(-1, 1);
    ctx.translate(-outW, 0);
  }
  ctx.drawImage(frame, 0, 0, outW, outH);

  return new VideoFrame(canvas, {timestamp: frame.timestamp});

});

And then our full pipeline looks like this:

const demuxer = new SimpleDemuxer(videoFile);
await demuxer.load();
const decoderConfig = await demuxer.getVideoDecoderConfig();

const encoderConfig = {/*Whatever we decide*/};

// Set up muxer
const muxer = new SimpleMuxer({ video: "avc" });

// Build the upscaling pipeline
await demuxer.videoStream()
  .pipeThrough(new VideoDecodeStream(decoderConfig))
  .pipeThrough(processStream) // Just defined this
  .pipeThrough(new VideoEncodeStream(encoderConfig))
  .pipeTo(muxer.videoSink());

// Get output
const blob = await muxer.finalize();

Here's a full working demo with the process pipeline:

Complete Demo

Now, for the complete tool, we'll make some key changes:

You can upload your own video
We'll preview the transformations by extracting a frame
We'll add progress measurement

For the input, that's trivial:

For frame previews, we could use WebCodecs to generate a preview, but because the preview doesn't need frame-level accuracy or high performance, it's easier to just use the HTML5 VideoElement to grab a video frame from the source file.

async function getFirstFrame(file) {
  const video = document.createElement("video");
  video.src = URL.createObjectURL(file);
  video.muted = true;

  await new Promise((resolve) => video.addEventListener("loadeddata", resolve, { once: true }));
  video.currentTime = 0;
  await new Promise((resolve) => video.addEventListener("seeked", resolve, { once: true }));

  return new VideoFrame(video, {timestamp: 0});
}

Finally, we can calculate progress in the process function by using the frame timestamp / the video duration.

const {duration} = await demuxer.getMediaInfo();


const processStream = new VideoProcessStream(async (frame) => {
  
  if (flipped) {
    ctx.scale(-1, 1);
    ctx.translate(-outW, 0);
  }
  ctx.drawImage(frame, 0, 0, outW, outH);

   // Frame timestamps are in microseconds, duration in seconds
  const progress = frame.timestamp/(duration*1e6); 

  return new VideoFrame(canvas, {timestamp: frame.timestamp});

});

Putting this all together, we can finally put together a full working video converter utility:

And that's it! We've built an MVP of something actually useful with WebCodecs 🎉, with Demuxing, Decoding, Canvas Transforms, Encoding, and Muxing.

The only difference between this and a full-fledged browser editing suite like Capcut is the scale and scope of transformations. But the video processing logic would be nearly identical.

Production Concerns

It's great that we've been able to create something useful, but before we wrap up, it's important to cover some production-level concerns.

Codecs

You might have noticed strings like vp09.00.10.08 in the demos, but I glossed over the details. We'll cover that now:

First, WebCodecs works with specific codec strings like vp09.00.10.08, not just 'vp9'. The following won't work:

const codec = VideoEncoder({
    codec: 'vp9', //This won't work!
    //...
})

As discussed previously, when decoding video, you don't really get a choice of codec. The video is already encoded, and so you need to get the codec from the video, as shown in the previous demos.

The demuxing libraries mentioned will identify the correct codec string, so you don't need to worry about that.

const decoderConfig = await demuxer.getVideoDecoderConfig();
//decoderConfig.codec = exact codec string for the video

When encoding a video, you can can choose your codec. Some people care a lot about codec choice, but from a very practical, pragmatic perspective, these rules of thumb should work for most developers:

If the videos your app generates will be downloaded by users and/or you want to output mp4 files, use h264.
If the videos generated are for internal use or you control video playback, and you don't care about format, use vp9 with webm (open source, better compression, most widely supported codec).
For most apps, these two options will cover you — deeper codec selection is a rabbit hole you don't need to go down yet.

Once you have a codec family chosen, you need to choose a specific codec string such as avc1.42001f.

The other numbers in the string specify certain codec parameters which are not as important from a developer perspective. If your goal is maximum compatibility, here's your cheat sheet for what codec strings to use

h264 (for mp4 files)

avc1.42001f - base profile, most compatible, supports up to 720p (99.6% support)
avc1.4d0034 - main profile, level 5.2 (supports up to 4K) (98.9% support)
avc1.42003e - base profile, level 6.2 (supports up to 8k) (86.8% support)
avc1.64003e - high profile - level 6.2 (supports up to 8k) (85.9% support)

vp9 (for webm files)

vp09.00.10.08.00 - basic, most compatible, level 1 (99.98% support)
vp09.00.40.08.00 - level 4 (99.96% support)
vp09.00.50.08.00 - level 5 (99.97% support)
vp09.00.61.08.00 - level 6 (99.97% support)

You can also use the getCodecString function from the webcodecs-utils package:

import { getCodecString } from 'webcodecs-utils'

const codec_string = getCodecString('vp9', width, height, bitrate)

You can find a comprehensive list of what codecs and codec strings you can use in WebCodecs here.

Bit rate

On top of height and width (which you presumably know from your content) and a codec string (which we just discussed), you also need to specify a bit rate when encoding video.

Video Compression algorithms have a trade-off between quality and file size. You can have high quality video with big file sizes, or lower quality video with lower file sizes.

Here's a quick visualization of what different quality levels look like for a 1080p video encoded at different bit rates:

300 kbps

1 Mbps

3 Mbps

10 Mbps

Here's a quick lookup table for bitrate guidance:

Resolution	Bitrate (30fps)	Bitrate (60fps)
4K	13-20 Mbps	20-30 Mbps
1080p	4.5-6 Mbps	6-9 Mbps
720p	2-4 Mbps	3-6 Mbps
480p	1.5-2 Mbps	2-3 Mbps
360p	0.5-1 Mbps	1-1.5 Mbps
240p	300-500 kbps	500-800 kbps

You can also use this utility function in your own app as a quick approximation:

function getBitrate(width, height, fps, quality = 'good') {
    const pixels = width * height;

    const qualityFactors = {
      'low': 0.05,
      'good': 0.08,
      'high': 0.10,
      'very-high': 0.15
    };

    const factor = qualityFactors[quality] || qualityFactors['good'];

    // Returns bitrate in bits per second
    return pixels * fps * factor;
  }

The same function is also available in the webcodecs-utils package:

import { getBitrate } from 'webcodecs-utils'

GPU vs CPU

Most user devices have some type of graphics card (typically called integrated graphics). These are specialized chips with specific silicon architectures optimized for encoding and decoding video, as well as for basic graphics.

You might hear "GPU" and think AI data centers and gamers. But as far as web applications are concerned, almost everyone has a GPU.

This is important because while most frontend-development almost exclusively deals with the CPU, WebCodecs and video processing work primarily on the GPU.

Here's a quick guide for what kind of data is stored where:

Data Type	Location
VideoFrame	GPU
EncodedVideoChunk	CPU
ImageBitmap	GPU
ArrayBuffer	CPU
File	CPU + Disk

There's a performance cost to moving data around, and this also becomes important for managing memory.

Memory

VideoFrame objects can be quite large – 30MB for a 4K video. A user's graphics card typically reserves some portion of RAM for "Video Memory" or "VRAM" which is where VideoFrame objects would be stored.

So if a user has 8GB of RAM, they would typically have 2GB of VRAM (how much is decided by the operating system).

If the amount of video data exceeds VRAM, your application will crash. This means that for a typical user, if you have more than 67 4K frames in memory (~2 seconds of video) the program will crash.

When VideoFrames are generated

VideoFrame objects are generated whenever you create a new VideoFrame(source) but also from the VideoDecoder, specifically the output callback. Every time a frame is generated, memory usage goes up.

How to remove VideoFrames

You can't rely on standard garbage collection for VideoFrame objects. You have to explicitly call close() on a frame when you're done:

frame.close()

In the Streams/Pipeline code and demo showed earlier, frames are actually being closed as soon as they are encoded in the VideoProcessStream and VideoEncodeStream interfaces.

The other reason Streams are helpful for WebCodecs is the highWaterMark property, which defaults to 10. What this means is that when you run:

await demuxer.videoStream()
  .pipeThrough(new VideoDecodeStream(decoderConfig))
  .pipeThrough(processStream) 
  .pipeThrough(new VideoEncodeStream(encoderConfig))
  .pipeTo(muxer.videoSink());

You ensure that no more than 10 video frames are in memory at any given time. The Streams API allows you to specify that limit while the browser itself deals with the logic of how to make that happen.

If you don't use the Streams API, you'll need to make sure you manage keeping track of memory limits and number of open video frames yourself.

Further Resources

Through this article we've gone over the basics of video processing, introduced the core concepts of the WebCodecs API, and built an MVP of a video converter utility. This is one of the simplest possible demos which actually touches all parts of the API. We also covered some basic production concerns.

This is just an introduction, and only scratches the surface of WebCodecs. For how simple the API looks, building a proper, production-ready WebCodecs application requires moving beyond hello-world demos.

To learn more about WebCodecs, you can check out MDN and the WebCodecsFundamentals, a comprehensive online textbook going much more in depth on WebCodecs.

You can also examine the source code of existing, production tested apps like Remotion Convert (source code) which is most similar to the demo app we covered, and Free AI Video Upscaler (source code, processing pipeline) which is the inspiration for the design patterns presented here and implemented in webcodecs-utils.

Finally, while WebCodecs is harder than it looks, you can make your life a lot easier by using a library like MediaBunny, which simplifies a lot of the details of things like memory management, file I/O, and other details. I use it in my own production WebCodecs applications.

Whether or not you actually build a full, production grade WebCodecs application, you now at least know that it's an option – one that's relatively new, provides better UX with lower server costs, and which is increasingly being adopted by prominent video applications like Capcut and Descript for its benefits.

How to Convert Video Files to a Gif in Python

Kolade Chris — Thu, 31 Mar 2022 16:06:08 +0000

Recently, I was able to convert some video files to a gif as I needed them in gif format for some of my articles.

I decided to show you how I did it in 3 lines of code, so you can save yourself the extra effort of looking up a Saas to do it for you.

How to Convert Video to a Gif in Python

To convert video to gif in Python, you need to install a package called moviepy with pip by opening your terminal and running pip install moviepy.

This module has several methods with which you can edit and enhance videos.

After successfully installing moviepy, you need to import a method called VideoFileClip from it. This is the method with which you will be able to specify the name of the video file and its relative path.

from moviepy.editor import VideoFileClip

The next thing you need to do is to specify the relative path of the video you want to convert to a gif inside the VideoFileClip method. Then you need to assign it to a variable.

In the code snippet below, I call that variable videoClip:

videoClip = VideoFileClip("my-life.mp4")

To finally convert the video to gif, you need to bring in the videoClip variable and use the write_gif() method on it, then specify the name you want to give to the gif inside it.

videoClip.write_gif("my-life.gif")

Open the terminal and run the file:

Check the folder inside which the video file is located and you should see the gif file. If you’re using VS Code, open the sidebar by pressing CTRL + B and you should see the gif file.

You can open the gif with VS Code too.

The whole code that did the conversion looks like this:

from moviepy.editor import VideoFileClip

videoClip = VideoFileClip("my-life.mp4")

videoClip.write_gif("my-life.gif")

You can learn more about the moviepy module on their official website.

If you have any questions, feel free to contact me on Twitter.

Thank you for reading.

HTML Video – How to Embed a Video Player with the HTML 5 Video Tag

Kolade Chris — Tue, 08 Feb 2022 16:59:19 +0000

Before the advent of HTML 5, web developers had to embed video on a web page with a plugin like Adobe flash player.

Today, you can easily embed videos in an HTML document with the tag.

In this article, we'll see how the tag works in HTML.

Basic Syntax
Attributes of the Tag
The src Attribute
The poster Attribute
The controls Attribute
The loop Attribute
The autoplay Attribute
The width and height Attributes
The muted Attribute
The preload Attribute
Conclusion

Basic Syntax

Just like the tag, takes an src attribute with which you need to specify the source of the video.

<video src="weekend.mp4">video>

By default, it is displayed like an image in the browser:

This CSS centers everything in the web page and change the background color:

 body {
      display: flex;
      align-items: center;
      justify-content: center;
      min-height: 100vh;
      background-color: #d3d3d3;
    }

In addition, you can specify multiple video sources for the with the tag. This tag has to carry its own src attribute too.

You can use multiple tags to make different formats of the same video available. The browser will then play the format it supports.

<video controls>
   <source src="weekend.mp4" />
   <source src="weekend.ogg" />
   <source src="weekend .webm" />
video>

Attributes of the Tag

The tag supports global attributes such as id, class, style, and so on.

If you're wondering what global attributes are, they are attributes supported by all HTML tags.

The specific attributes supported by the tag include src, poster, controls, loop, autoplay, width, height, muted, preload, and others.

The `src` Attribute

The src attribute is used to specify the source of the video. It could be a relative path to the video on your local machine or a live video link from the internet.

<video src="weekend.mp4">video>

It’s optional because you can use the tag instead of it.

The `poster` Attribute

With the poster attribute, you can incorporate an image to show before the video starts playing or while it is downloading.

<video src="weekend.mp4" poster="benefits-of-coding.jpg">video>

Instead of the image of the first scene of the video, the browser will show this image:

The `controls` Attribute

When you use control, it lets the browser show playback controllers such as play and pause, volume, seek, and so on.

<video
      controls
      src="weekend.mp4"
      poster="benefits-of-coding.jpg"
>video>

The `loop` Attribute

With the loop attribute, you can make the video repeat automatically. That is, make it start playing again every time it stops playing.

<video
      controls
      loop
      src="weekend.mp4"
      poster="benefits-of-coding.jpg"
>video>

The `autoplay` Attribute

The autoplay attribute lets you make the video start playing automatically immediately after the page loads.

<video
      controls
      loop
      autoplay
      src="weekend.mp4"
      poster="benefits-of-coding.jpg"
>video>

The `width` and `height` Attributes

You can use the width and height attributes to specify a width and height for the video in pixels. It accepts only absolute values, for example, pixels.

<video
      controls
      loop
      autoplay
      src="weekend.mp4"
      width="350px"
      height="250px"
      poster="benefits-of-coding.jpg"
>video>

The `muted` Attribute

You can use the muted attribute to tell the browser not to play any sound associated with the video when it starts playing.

<video
      controls
      loop
      autoplay
      muted
      src="weekend.mp4"
      width="350px"
      height="250px"
      poster="benefits-of-coding.jpg"
>video>

If the controls attribute is specified, the user can click the volume button to unmute.

The `preload` Attribute

With the preload attribute, you can provide a hint to the browser on whether to download the video or not when the page loads.

This attribute is crucial for user experience.

You can use 3 values with the preload attribute:

none: specifies that the video won't load until the user presses play
auto: specifies that the video should be downloaded even when the user doesn't press play
metadata: specifies that the browser should collect metadata such as length, size, duration, and so on.

<video
      controls
      loop
      autoplay
      muted="true"
      preload="metadata"
      src="weekend.mp4"
      width="350px"
      height="250px"
      poster="benefits-of-coding.jpg"
>video>

Conclusion

In this article, you learned about the HTML5 tag and its attributes, so you can use it in your projects the right way.

Since audio is an important part of a complete video, you can also use the tag to put an audio file on a web page. But in most cases, you should use the tag for this purpose for the appropriate user experience.

If you find this article helpful, share it with your friends and family so it can reach more people who might need it.

Android Camera2 – How to Use the Camera2 API to Take Photos and Videos

Tomer — Thu, 29 Jul 2021 22:36:44 +0000

We all use the camera on our phones and we use it a l-o-t. There are even some applications that have integrated the camera as a feature.

On one end, there is a standard way of interacting with the camera. On the other, there is a way to customize your interaction with the camera. This distinction is an important one to make. And that’s where Camera2 comes in.

What is Camera2?

While it has been available since API level 21, the Camera2 API has got to be one of the more complex pieces of architecture developers have to deal with.

This API and its predecessor were put in place so developers could harness the power of interacting with the camera inside of their applications.

Similar to how there is a way to interact with the microphone or the volume of the device, the Camera2 API gives you the tools to interact with the device's camera.

In general, if you want to user the Camera2 API, it would probably be for more than just taking a picture or recording a video. This is because the API lets you have in depth control of the camera by exposing various classes that will need to be configured per specific device.

Even if you've dealt with the camera previously, it is such a drastic change from the former camera API, that you might as well forget all that you know.

There are a ton of resources out there that try to showcase how to use this API directly, but some of them may be outdated and some don’t present the whole picture.

So, instead of trying to fill in the missing pieces by yourself, this article will (hopefully) be your one stop shop for interacting with the Camera2 API.

Camera2 Use Cases

Before we dive into anything, it is important to understand that if you only want to use the camera to take a picture or to record a video, you do not need to bother yourself with the Camera2 API.

The primary reason to use the Camera2 API is if your application requires some custom interaction with the camera or its functionality.

If you are interested in doing the former instead of the latter, I'll suggest that you visit the following documentation from Google:

There you will find all the necessary steps you need to take to capture great photos and videos with your camera. But in this article, the main focus will be on how to use Camera2.

Now, there are some things we need to add to our manifest file:

Camera permissions:

<uses-permission android:name="android.permission.CAMERA" />

Camera feature:

<uses-feature android:name="android.hardware.camera" />

You will have to deal with checking if the camera permission has been granted or not, but since this topic has been covered widely, we won’t be dealing with that in this article.

How to Set up the Camera2 API Components

The Camera2 API introduces several new interfaces and classes. Let’s break down each of them so we can better understand how to use them.

Look at all those components

First off, we’ll start with the TextureView.

Camera2 TextureView Component

A TextureView is a UI component that you use to display a content stream (think video). We need to use a TextureView to display the feed from the camera, whether it's a preview or before taking the picture/video.

Two properties that are important to use regarding the TextureView are:

The SurfaceTexture field
The SurfaceTextureListener interface

The first is where the content will get displayed, and the second has four callbacks:

private val surfaceTextureListener = object : TextureView.SurfaceTextureListener {
        override fun onSurfaceTextureAvailable(texture: SurfaceTexture, width: Int, height: Int) {

        }
        override fun onSurfaceTextureSizeChanged(texture: SurfaceTexture, width: Int, height: Int) {

        }

        override fun onSurfaceTextureDestroyed(texture: SurfaceTexture) {

        }
        override fun onSurfaceTextureUpdated(texture: SurfaceTexture) {

        }
}

The first callback is crucial when using the camera. This is because we want to be notified when the SurfaceTexture is available so we can start displaying the feed on it.

Be aware that only once the TextureView is attached to a window does it become available.

Interacting with the camera has changed since the previous API. Now, we have the CameraManager. This is a system service that allows us to interact with CameraDevice objects.

The methods you want to pay close attention to are:

After we know that the TextureView is available and ready, we need to call openCamera to open a connection to the camera. This method takes in three arguments:

CameraId - String
CameraDevice.StateCallback
A Handler

The CameraId argument signifies which camera we want to connect to. On your phone, there are mainly two cameras, the front and the back. Each has its own unique id. Usually, it is either a zero or a one.

How do we get the camera id? We use the CameraManager’s getCamerasIdList method. It will return an array of string type of all the camera ids identified from the device.

val cameraManager: CameraManager = getSystemService(Context.CAMERA_SERVICE) as CameraManager
val cameraIds: Array = cameraManager.cameraIdList
var cameraId: String = ""
for (id in cameraIds) {
    val cameraCharacteristics = cameraManager.getCameraCharacteristics(id)
    //If we want to choose the rear facing camera instead of the front facing one
    if (cameraCharacteristics.get(CameraCharacteristics.LENS_FACING) == CameraCharacteristics.LENS_FACING_FRONT) 
      continue
    }

    val previewSize = cameraCharacteristics.get(CameraCharacteristics.SCALER_STREAM_CONFIGURATION_MAP)!!.getOutputSizes(ImageFormat.JPEG).maxByOrNull { it.height * it.width }!!
    val imageReader = ImageReader.newInstance(previewSize.width, previewSize.height, ImageFormat.JPEG, 1)
    imageReader.setOnImageAvailableListener(onImageAvailableListener, backgroundHandler)
    cameraId = id
}

The next arguments are callbacks to the camera state after we try to open it. If you think about it, there can only be several outcomes for this action:

The camera manages to open successfully
The camera disconnects
Some error occurs

And that’s what you will find inside the CameraDevice.StateCallback:

 private val cameraStateCallback = object : CameraDevice.StateCallback() {
        override fun onOpened(camera: CameraDevice) {

        }

        override fun onDisconnected(cameraDevice: CameraDevice) {

        }

        override fun onError(cameraDevice: CameraDevice, error: Int) {
            val errorMsg = when(error) {
                ERROR_CAMERA_DEVICE -> "Fatal (device)"
                ERROR_CAMERA_DISABLED -> "Device policy"
                ERROR_CAMERA_IN_USE -> "Camera in use"
                ERROR_CAMERA_SERVICE -> "Fatal (service)"
                ERROR_MAX_CAMERAS_IN_USE -> "Maximum cameras in use"
                else -> "Unknown"
            }
            Log.e(TAG, "Error when trying to connect camera $errorMsg")
        }
    }

The third argument deals with where this work will happen. Since we don’t want to occupy the main thread, it is better to do this work in the background.

That’s why we need to pass a Handler to it. It would be wise to have this handler instance instantiated with a thread of our choosing so we can delegate work to it.

private lateinit var backgroundHandlerThread: HandlerThread
private lateinit var backgroundHandler: Handler

 private fun startBackgroundThread() {
    backgroundHandlerThread = HandlerThread("CameraVideoThread")
    backgroundHandlerThread.start()
    backgroundHandler = Handler(
        backgroundHandlerThread.looper)
}

private fun stopBackgroundThread() {
    backgroundHandlerThread.quitSafely()
    backgroundHandlerThread.join()
}

With everything that we have done, we can now call openCamera:

cameraManager.openCamera(cameraId, cameraStateCallback,backgroundHandler)

Then in the onOpened callback, we can start to deal with the logic on how to present the camera feed to the user via the TextureView.

_Photo by [Unsplash](https://unsplash.com/@markusspiske?utm_source=medium&utm_medium=referral" rel="photo-creator noopener">Markus Spiske on How to Show a Preview of the Feed

We've got our camera (cameraDevice) and our TextureView to show the feed. But we need to connect them to each other so we can show a preview of the feed.

To do that, we will be using the SurfaceTexture property of TextureView and we will be building a CaptureRequest.

val surfaceTexture : SurfaceTexture? = textureView.surfaceTexture // 1

val cameraCharacteristics = cameraManager.getCameraCharacteristics(cameraId) //2
val previewSize = cameraCharacteristics.get(CameraCharacteristics.SCALER_STREAM_CONFIGURATION_MAP)!!
  .getOutputSizes(ImageFormat.JPEG).maxByOrNull { it.height * it.width }!!

surfaceTexture?.setDefaultBufferSize(previewSize.width, previewSize.height) //3

val previewSurface: Surface = Surface(surfaceTexture)

captureRequestBuilder = cameraDevice.createCaptureRequest(CameraDevice.TEMPLATE_PREVIEW) //4
captureRequestBuilder.addTarget(previewSurface) //5

cameraDevice.createCaptureSession(listOf(previewSurface, imageReader.surface), captureStateCallback, null) //6

In the code above, first we get the surfaceTexture from our TextureView. Then we use the cameraCharacteristics object to get the list of all output sizes. To get the desired size, we set it for the surfaceTexture.

Next, we create a captureRequest where we pass in TEMPLATE_PREVIEW. We add our input surface to the captureRequest.

Finally, we start a captureSession with our input and output surfaces, captureStateCallback, and pass in null for the handler

So what is this captureStateCallback? If you remember the diagram from the beginning of this article, it is part of the CameraCaptureSession which we are starting. This object tracks the progress of the captureRequest with the following callbacks:

onConfigured
onConfigureFailed

private val captureStateCallback = object : CameraCaptureSession.StateCallback() {
        override fun onConfigureFailed(session: CameraCaptureSession) {

        }
        override fun onConfigured(session: CameraCaptureSession) {

        }
}

When the cameraCaptureSession is configured successfully, we set a repeating request for the session to allow us to show the preview continuously.

To do that, we use the session object we get in the callback:

 session.setRepeatingRequest(captureRequestBuilder.build(), null, backgroundHandler)

You will recognize our captureRequestBuilder object that we created earlier as the first argument for this method. We enact the build method so the final parameter passed in is a CaptureRequest.

The second argument is a CameraCaptureSession.captureCallback listener, but since we don’t want to do anything with the captured images (since this is a preview), we pass in null.

The third argument is a handler, and here we use our own backgroundHandler. This is also why we passed in null in the previous section, since the repeating request will run on the background thread.

_Photo by [Unsplash](https://unsplash.com/@dicky_juwono?utm_source=medium&utm_medium=referral" rel="photo-creator noopener">Dicky Jiang on How to Take a Picture

Having a live preview of the camera is awesome, but most users will probably want to do something with it. Some of the logic that we will write to take a picture will be similar to what we did in the previous section.

We will create a captureRequest
We will use an ImageReader and its listener to gather the photo taken
Using our cameraCaptureSession, we will invoke the capture method

val orientations : SparseIntArray = SparseIntArray(4).apply {
    append(Surface.ROTATION_0, 0)
    append(Surface.ROTATION_90, 90)
    append(Surface.ROTATION_180, 180)
    append(Surface.ROTATION_270, 270)
}

val captureRequestBuilder = cameraDevice.createCaptureRequest(CameraDevice.TEMPLATE_STILL_CAPTURE)
captureRequestBuilder.addTarget(imageReader.surface)

val rotation = windowManager.defaultDisplay.rotation
captureRequestBuilder.set(CaptureRequest.JPEG_ORIENTATION, orientations.get(rotation))
cameraCaptureSession.capture(captureRequestBuilder.build(), captureCallback, null)

But what is this ImageReader? Well, an ImageReader provides access to image data that is rendered onto a surface. In our case, it is the surface of the TextureView.

If you look at the code snippet from the previous section, you will notice we have already defined an ImageReader there.

val cameraManager: CameraManager = getSystemService(Context.CAMERA_SERVICE) as CameraManager
val cameraIds: Array = cameraManager.cameraIdList
var cameraId: String = ""
for (id in cameraIds) {
    val cameraCharacteristics = cameraManager.getCameraCharacteristics(id)
    //If we want to choose the rear facing camera instead of the front facing one
    if (cameraCharacteristics.get(CameraCharacteristics.LENS_FACING) == CameraCharacteristics.LENS_FACING_FRONT) 
      continue
    }

    val previewSize = cameraCharacteristics.get(CameraCharacteristics.SCALER_STREAM_CONFIGURATION_MAP)!!.getOutputSizes(ImageFormat.JPEG).maxByOrNull { it.height * it.width }!!
    val imageReader = ImageReader.newInstance(previewSize.width, previewSize.height, ImageFormat.JPEG, 1)
    imageReader.setOnImageAvailableListener(onImageAvailableListener, backgroundHandler)
    cameraId = id
}

As you can see above, we instantiate an ImageReader by passing in a width and height, the image format we would like our image to be in and the number of images that it can capture.

A property the ImageReader class has is a listener called onImageAvailableListener. This listener will get triggered once a photo is taken (since we passed in its surface as the output source for our capture request).

val onImageAvailableListener = object: ImageReader.OnImageAvailableListener{
        override fun onImageAvailable(reader: ImageReader) {
            val image: Image = reader.acquireLatestImage()
        }
    }

⚠️ Make sure to close the image after processing it or else you will not be able to take another photo.

_Photo by [Unsplash](https://unsplash.com/@jakobowens1?utm_source=medium&utm_medium=referral" rel="photo-creator noopener">Jakob Owens on How to Record a Video

To record a video, we need to interact with a new object called MediaRecorder. The media recorder object is in charge of recording audio and video and we will be using it do just that.

Before we do anything, we need to setup the media recorder. There are various configurations to deal with and they must be in the correct order or else exceptions will be thrown.

Below is an example of a selection of configurations that will allow us to capture video (without audio).

fun setupMediaRecorder(width: Int, height: Int) {
  val mediaRecorder: MediaRecorder = MediaRecorder()
  mediaRecorder.setVideoSource(MediaRecorder.VideoSource.SURFACE)
  mediaRecorder.setOutputFormat(MediaRecorder.OutputFormat.MPEG_4)
  mediaRecorder.setVideoEncoder(MediaRecorder.VideoEncoder.H264)
  mediaRecorder.setVideoSize(videoSize.width, videoSize.height)
  mediaRecorder.setVideoFrameRate(30)
  mediaRecorder.setOutputFile(PATH_TO_FILE)
  mediaRecorder.setVideoEncodingBitRate(10_000_000)
  mediaRecorder.prepare()
}

Pay attention to the setOutputFile method as it expects a path to the file which will store our video. At the end of setting all these configurations we need to call prepare.

Note that the mediaRecorder also has a start method and we must call prepare before calling it.

After setting up our mediaRecoder, we need to create a capture request and a capture session.

fun startRecording() {
        val surfaceTexture : SurfaceTexture? = textureView.surfaceTexture
        surfaceTexture?.setDefaultBufferSize(previewSize.width, previewSize.height)
        val previewSurface: Surface = Surface(surfaceTexture)
        val recordingSurface = mediaRecorder.surface
        captureRequestBuilder = cameraDevice.createCaptureRequest(CameraDevice.TEMPLATE_RECORD)
        captureRequestBuilder.addTarget(previewSurface)
        captureRequestBuilder.addTarget(recordingSurface)

        cameraDevice.createCaptureSession(listOf(previewSurface, recordingSurface), captureStateVideoCallback, backgroundHandler)
    }

Similar to setting up the preview or taking a photograph, we have to define our input and output surfaces.

Here we are creating a Surface object from the surfaceTexture of the TextureView and also taking the surface from the media recorder. We are passing in the TEMPLATE_RECORD value when creating a capture request.

Our captureStateVideoCallback is of the same type we used for the still photo, but inside the onConfigured callback we call media recorder’s start method.

val captureStateVideoCallback = object : CameraCaptureSession.StateCallback() {
      override fun onConfigureFailed(session: CameraCaptureSession) {

      }

      override fun onConfigured(session: CameraCaptureSession) {
          session.setRepeatingRequest(captureRequestBuilder.build(), null, backgroundHandler)
          mediaRecorder.start()
      }
  }

Now we are recording a video, but how do we stop recording? For that, we will be using the stop and reset methods on the mediaRecorder object:

mediaRecorder.stop()
mediaRecorder.reset()

Conclusion

That was a lot to process. So if you made it here, congratulations! There is no way around it – only by getting your hands dirty with the code will you start to understand how everything connects together.

You are more than encouraged to look at all the code featured in this article below :

https://github.com/TomerPacific/MediumArticles/tree/master/Camrea2API

Bear in mind that this is just the tip of the iceberg when it comes to the Camera2 API. There are a lot of other things you can do, like capturing a slow motion video, switching between the front and back cameras, controlling the focus, and much more.

How to Setup Instagram-like Video Stories in Your App

freeCodeCamp — Tue, 22 Sep 2020 22:01:31 +0000

By Agam Mahajan

The article will teach you how you can show multiple videos in one view, like we see in Instagram Stories.

We'll also learn how to cache the videos in the user's device to help save that user's data and network calls and smooth out their experience.

A quick note: this implementation is for iOS, but the same logic can be applied in other codebases as well.

In general, whenever we want to play a video, we get the video URL and simply present **AVPlayerViewController** with that URL.

let videoURL = URL(string: "Sample-Video-Url")
let player = AVPlayer(url: videoURL!)
let playerViewController = AVPlayerViewController()
playerViewController.player = player
self.present(playerViewController, animated: true) {
    playerViewController.player.play()
}

Pretty straightforward, right?

But the drawback of this implementation is that you can’t customize it. Which, if you are working for a good product company, will be an everyday ask. :D

Alternatively, we can use **AVPlayerLayer** which will do a similar job – but it allows us to customize the view and other elements.

let videoURL = URL(string: "Sample-Video-Url")
let player = AVPlayer(url: videoURL!)
let playerLayer = AVPlayerLayer(player: player)
playerLayer.frame = self.view.bounds
self.view.layer.addSublayer(playerLayer)
player.play()

But what if you want to combine multiple videos, similar to Instagram stories? Then we probably have to dive in a bit deeper.

Coming Back to the Problem Statement

Now, let me tell you about my use case.

In my company, Swiggy, we want to be able to show multiple videos, where each video should be shown x number of times.

On top of that, it should have an Instagram-like stories feature.

Video-2 should seamlessly autoplay after video-1, and so on
It should jump to corresponding videos whenever the user taps left or right.

If you think caching could be the answer, don't worry – I’ll get to that in a bit.

Multiple layers in one view

First things first, we need to figure out how to add multiple videos in one view.

What we can do is create one **AVPlayerLayer** and assign the first video to it. When the first video is finished, then we assign the next video to the same **AVPlayerLayer** .

func addPlayer(player: AVPlayer) {
    player.currentItem?.seek(to: CMTime.zero, completionHandler: nil)
    playerViewModel?.player = player
    playerView.playerLayer.player = player
}

To jump to the previous or next video, we can do the following:

Add a tap gesture on the view
If the touch location ‘x’ is less than half of the screen, then assign the previous video, else assign the next video

@objc func didTapSnap(_ sender: UITapGestureRecognizer) {
   let touchLocation = sender.location(ofTouch: 0, in: view)
   if touchLocation.x < view.frame.width/2 {
     changePlayer(forward: false)
     } 
   else {
     fillupLastPlayedSnap()
     changePlayer(forward: true)
    }
}

There we go. We now have our own Insta-like Stories video feature.

But our task is not done yet!

Now Back to Caching

We don't want it to be the case that every time a user navigates from one video to another, it starts to download the video from the beginning.

Also, if the video is shown again in the next session, we don't need to do another server call.

If we can cache the video, then the user’s internet will be saved. The load on the server will also be reduced.

Finally, the UX will improve as the user won't have to wait a long time to load the video.

As a good developer, reducing a user’s internet usage should be our priority.

Less data usage, happy customer

Load Videos Asynchronously

The first thing we can use to load videos is loadValuesAsynchronously.

According to the Apple documentation, loadValuesAsynchronously:

Tells the asset to load the values of all of the specified keys (property names) that are not already loaded.

The advantage here is that it saves the video until it is rendered. So it will not download the video from the start whenever the user navigates to a previous video. It will only download the part which was not rendered earlier.

Let's look at an example**: say we have Video_1 that is 15 seconds long, and the user saw 10 seconds of that video before jumping to Video_2.

Now if the user comes back to Video_1 again by tapping to the left, loadValuesAsynchronously will have that 10 seconds of video saved and will only download the remaining (unwatched) 5 seconds.

func asynchronouslyLoadURLAssets(_ newAsset: AVURLAsset) {
    DispatchQueue.main.async {
            newAsset.loadValuesAsynchronously(forKeys: self.assetKeysRequiredToPlay) {
                for key in self.assetKeysRequiredToPlay {
                    var error: NSError?
                    if newAsset.statusOfValue(forKey: key, error: &error) == .failed {
                        self.delegate?.playerDidFailToPlay(message: "Can't use this AVAsset because one of it's keys failed to load")
                        return
                    }
                }

                if !newAsset.isPlayable || newAsset.hasProtectedContent {
                    self.delegate?.playerDidFailToPlay(message: "Can't use this AVAsset because it isn't playable or has protected content")
                    return
                }
                let currentItem = AVPlayerItem(asset: newAsset)
                let currentPlayer = AVPlayer(playerItem: currentItem)
                self.delegate?.playerDidSuccesToPlay(playerDetail: currentPlayer)
            }

        }

You can find more details on loadValuesAsynchronously at this link.

The caveat here is it persists video data for that session only. If the user closes and comes back to the app, the video has to be downloaded again.

So what other options do we have?

Saving Videos in Device

Now comes Video Caching!

When the video is rendered completely, we can export the video and save it to the user’s device. When the video comes up again in their next session, we can pick the video from the device and simply load it.

AVAssetExportSession
According to Apple's documentation:

An object that transcodes the contents of an asset source object to create an output of the form described by a specified export preset.

This means that AVAssetExportSession acts as an exporter, through which we can save the file to the user’s device. We have to give the output URL and the output file type.

let exporter = AVAssetExportSession(asset: avUrlAsset, presetName: AVAssetExportPresetHighestQuality)
exporter?.outputURL = outputURL
exporter?.outputFileType = AVFileType.mp4

exporter?.exportAsynchronously(completionHandler: {
    print(exporter?.status.rawValue)
    print(exporter?.error)
})

You can find more details on AVAssetExportSession at this link.

Now the only thing left is to fetch the data from the cache and load the video.

Before loading, check if the video is present in the cache. Then fetch that local URL and give it to loadValuesAsynchronously.

if let cacheUrl = FindCachedVideoURL(forVideoId: videoId) {
    let cacheAsset = AVURLAsset(url: cacheUrl)
    asynchronouslyLoadURLAssets(cacheAsset)
}
else {
  asynchronouslyLoadURLAssets(newAsset)
}

Caching will help reduce a lot of user data usage as well as server load (sometimes up to TBs of data).

Other use cases for caching

What other use cases we can handle with caching? The following are examples of ways you could use caching here:

Ensure Optimum Storage

Before saving the video on the device, you should check whether enough storage is present on the device to do so.

func isStorageAvailable() -> Bool {
   let fileURL = URL(fileURLWithPath: NSHomeDirectory() as String)
   do {
      let values = try fileURL.resourceValues(forKeys: [.volumeAvailableCapacityForImportantUsageKey, .volumeTotalCapacityKey])
      guard let totalSpace = values.volumeTotalCapacity,
      let freeSpace = values.volumeAvailableCapacityForImportantUsage else {
          return false
      }
      if freeSpace > minimumSpaceRequired {
         return true
      } else {
          // Capacity is unavailable
          return false
      }  
    catch {}
    return false
}

Remove Deprecated Videos

You can have a timestamp for each video so that you can clean up old videos from device memory after a certain number of days.

func cleanExpiredVideos() {
        let currentTimeStamp = Date().timeIntervalSince1970
        var expiredKeys: [String] = []
        for videoData in videosDict where currentTimeStamp - videoData.value.timeStamp >= expiryTime {
            // video is expired. delete
            if let _ = popupVideosDict[videoData.key] {
                expiredKeys.append(videoData.key)
            }
        }
        for key in expiredKeys {
            if let _ = popupVideosDict[key] {
                popupVideosDict.removeValue(forKey: key)
                deleteVideo(ForVideoId: key)
            }
        }
    }

Maintain a limited number of videos

You can make sure only a limited number of videos are saved in the file at a time. Let's say 10.

Then when the 11th video comes, you can have it delete the least-viewed video and replace it with the new one. This will also help you not consume too much of the user’s device memory.

func removeVideoIfMaxNumberOfVideosReached() {
        if popupVideosDict.count >= maxVideosAllowed {
            // remove the least recently used video
            let sortedDict = popupVideosDict.keysSortedByValue { (v1, v2) -> Bool in
                v1.timeStamp < v2.timeStamp
            }
            guard let videoId = sortedDict.first else {
                return
            }
            popupVideosDict.removeValue(forKey: videoId)
            deleteVideo(ForVideoId: videoId)
        }
    }

Measure Impact

Don’t forget to add logs, so that you can measure the impact of your feature. I have used a custom New Relic Log Event to do so:

 static func findCachedVideoURL(forVideoId id: String) -> URL? {
        let nsDocumentDirectory = FileManager.SearchPathDirectory.documentDirectory
        let nsUserDomainMask = FileManager.SearchPathDomainMask.userDomainMask
        let paths = NSSearchPathForDirectoriesInDomains(nsDocumentDirectory, nsUserDomainMask, true)
        if let dirPath = paths.first {
            let fileURL = URL(fileURLWithPath: dirPath).appendingPathComponent(folderPath).appendingPathComponent(id + ".mp4")
            let filePath = fileURL.path
            let fileManager = FileManager.default
            if fileManager.fileExists(atPath: filePath) {
                NewRelicService.sendCustomEvent(with: NewRelicEventType.statusCodes,
                                                                   eventName: NewRelicEventName.videoCacheHit,
                                                                   attributes: [NewRelicAttributeKey.videoSize: fileURL.fileSizeString])
                return fileURL
            } else {
                return nil
            }
        }
        return nil
    }

To convert the file size to a readable format, I fetch the file size and convert it to Mbs.

extension URL {
    var attributes: [FileAttributeKey : Any]? {
        do {
            return try FileManager.default.attributesOfItem(atPath: path)
        } catch let error as NSError {
            print("FileAttribute error: \(error)")
        }
        return nil
    }

    var fileSize: UInt64 {
        return attributes?[.size] as? UInt64 ?? UInt64(0)
    }

    var fileSizeString: String {
        return ByteCountFormatter.string(fromByteCount: Int64(fileSize), countStyle: .file)
    }
}

This is how you can measure your impact:

Total data saved = number of requests video_size = 2.4MB20.3K ~= 49GB

This is just two weeks of data. You do the math for the whole year. ? And this will keep on increasing exponentially over time.

That’s it! You have now built your own caching mechanism.

Wrapping up

In this article, we saw how easily we can integrate multiple videos in one view, giving an Instagram-like story feature.

We also learned why and how caching plays an important role here. We saw how it helps the user save a lot of data and have a smooth user experience.

Do let me know if I missed something, or if you can think of any more use cases.
Thanks for your time. :)

HTML5 Video: How to Embed Video in Your HTML

freeCodeCamp — Mon, 27 Jan 2020 00:43:00 +0000

Before HTML5, in order to have a video play on a webpage, you would need to use a plugin like Adobe Flash Player. With the introduction of HTML5, you can now place videos directly into the page itself.

This makes it possible to have videos play on pages that are designed for mobile devices, as plugins like Adobe Flash Player don't work on Android or iOS.

The HTML element is used to embed video in web documents. It may contain one or more video sources, represented using the src attribute or the source element.

To embed a video file, just add this code snippet and change the src to the path of your video file:

<video controls>
  <source src="tutorial.ogg" type="video /ogg">
  <source src="tutorial.mp4" type="video /mpeg">
  Your browser does not support the video element. Kindly update it to latest version.
video >

The element is supported by all modern browsers. However, not all browsers support the same video file format. MP4 files are the most widely accepted format, and other formats like WebM and Ogg are supported in Chrome, Firefox, and Opera.

To ensure your video plays in most browsers, it's best practice to encode them into both Ogg and MP4 formats, and include both in the element like in the example above. Browsers will use the first recognized format.

If for some reason the browser doesn't recognize any of the formats, the text "Your browser does not support the video element. Kindly update it to latest version" will be displayed instead.

You also might have noticed controls in the tag. This element includes a lot of useful attributes to customize the playback experience.

attributes

`controls`

The controls attribute handles whether controls such as the play/pause button or volume slider appear.

This is a boolean attribute, meaning it can be set to either true or false. To set it to true, simply add it to the tag. If it's not present in the tag then it will be set to false and the controls won't appear.

`autoplay`

"autoplay" can be set to either true or false. You set it to true by adding it into the tag, if it is not present in the tag it is set to false. If set to true, the video will begin playing as soon as enough of the video has buffered for it to be able to play. Many people find autoplaying videos as disruptive or annoying. So use this feature sparingly. Also note, that some mobile browsers, such as Safari for iOS, ignore this attribute.

This is another boolean attribute. By including autoplay in the tag, the embedded video will begin playing as soon as enough of it has buffered.

<video autoplay>
  <source src="video.mp4" type="video/mp4">
video>

Keep in mind that many people find autoplaying videos disruptive or annoying, so use this feature sparingly. Also note that some mobile browsers like Safari for iOS ignore this attribute entirely.

`poster`

The poster attribute is the image that shows on the video until the user clicks to play it.

<video poster="poster.png">
  <source src="video.mp4" type="video/mp4">
video>

Videos can be expensive

While it's easier than ever to include videos on your page, it's often better to upload your videos to a service like YouTube, Vimeo, or Wistia and embed their code instead. This is because serving videos can be expensive, both for you in terms of server costs and for your viewers if they have limited data plans.

Hosting your own video files can also lead to problems with bandwith, which could mean stuttering of slow loading videos. On top of that, browsers tend to vary in quality when it comes to video playback, so it's hard to control exactly what your viewers will see. It's also very easy to download videos embedded with the tag, so if you're concerned with piracy you might want to look into other options.

And with that, go forth and embed videos to your heart's content. Or not – it's your call.

HLS Video Streaming: What it is, and When to Use it

freeCodeCamp — Wed, 18 Dec 2019 22:59:12 +0000

By Anton Garcia Diaz

In this short article I will focus on HLS, the most extended adaptive bitrate protocol for video delivery. I'll answer some of the main questions that anyone considering HLS for the first time will likely ask: what it is, when to use it, and how to use it.

To help along the way, I will show some examples using an online video publishing tool that you can freely use to test out the performance of HLS on your own.

What is HLS and how does it work?

HLS is a protocol defined by Apple to implement an adaptive bitrate streaming format that can be supported on their devices and software. Over the time, it has gained widespread support.

The most important feature of HLS is its ability to adapt the bitrate of the video to the actual speed of the connection. This optimizes the quality of the experience.

HLS videos are encoded in different renditions at different resolutions and bitrates. This is usually referred to as the bitrate ladder. When a connection gets slower, the protocol automatically adjusts the requested bitrate to the bandwidth available.

Compared to progressive videos, HLS avoids re-buffering and stalling effects as well as bloating the client connection. We can see it at work in this video.

https://store.abraia.me/05bf471cbb3f9fa9ed785718e6f60e28/HLS-video/HLS_video-at-work/index.html

In essence, HLS provides a much better user experience when we use video content in our apps or sites.

It has native support in iOS and Android. It is also supported by Safari, and by using some JavaScript it is supported in all the main browsers (Chrome, Firefox, Edge). While using HLS requires some effort, it's not a big deal.

Let's see when we should use it and how.

When should we use HLS?

There are cases where videos are not that heavy. For instance, you could have a sequence of images encoded as a 1-2 seconds video, with a weight of less than 1 MB. In this case, a progressive video – that can be consumed, like an image, using plain HTML5 – is for sure the best option. HLS does not offer any advantage here.

But, HLS does make sense when we want to deliver high resolution videos (HD or over) with a weight over 3MB. This type of content may kill our web UX when viewed on an average mobile connection.

It's worth noting that this is the case in an increasing amount of media content, including many short videos of less than 20 seconds used in ecommerce and marketing contexts. In the example at the beginning of the post, we have a full HD video of only 9 seconds that weights in at over 6MB.

How can we use HLS in our sites?

To use HLS we have to address a number of aspects. I'll focus on two important points:

the need to encode the video, and,
the need to embed it in our page.

For a more comprehensive view on what a general video publishing pipeline entails, you may check out this post.

HLS encoding

We can encode videos in HLS in-house or by using a third party service. To build an in-house encoder, the best option is to use FFMPEG, a powerful open source library for video processing and encoding. In this case, we should analyse the content we are going to encode and set a number of parameters.

In HLS we should define a bitrate ladder (the bitrates and resolutions of each step) and the length of chunks. When we encode a video, we end with a set of playlists and chunks. Typically, we end the former with .m3u8 and the latter with .ts extensions. We can see an example in the next image.

We can see one master playlist, one additional playlist per rendition, and all the chunks of each rendition. The master playlist specifies the bitrate ladder and the relative path to each rendition.

Apple makes a generic recommendation specifying the bitrate ladder and a chunk duration of 10 seconds. However, this is not very useful for many types of content, like the short videos common in ecommerce and marketing.

In fact, the best approach is to tune the bitrate ladder specifically to the content of the video. In this case, if you want to make the most of HLS and you're not expert in encoding, a third party service providing per-title encoding (with HLS) is likely the right choice.

HLS players

Here, we find two main options. We can stick to the HTML5 player or we can use one implemented in JavaScript.

HTML5 player

Recent Safari versions support HLS. In this case, you may use HLS playlists in the same manner as progressive videos. With other browsers, you may use a tiny JavaScript library to implement the HLS protocol and again use the HTML5 player for progressive videos.

This can be done with HLS.js. This library just implements the negotiation of renditions, based on the available bandwidth. Support is almost universal, only conditional on the support of the media element's API.

JavaScript Player

In case we need to customise the video experience – which is pretty common in marketing and stories pages – then we need to use something other than the default HTML5 player.

While there are many commercial options out there, Video.js is a good choice. It's an open source player that supports a high degree of customization, including different skins and controls.

A player like Video.js also supports the tracking of video-related events (like play or pause actions) so we can include them in our own analytics. In fact, including these data in our Google Analytics is really easy.

GA data for events tracked in a video viewed with a Video.js player

Summary

I've tackled the first questions about HLS that most potential users will have: what it is, and when we should use it.

While a video publishing pipeline reliant on HLS can be implemented and deployed in-house with open source tools like FFMPEG and video.js, it may be a good idea to use a video publishing service if you're not an expert in the tech. They bring advanced features like per-title encoding, take care of all the hard work, and let us focus on our customization needs.

How to deploy a complete video publishing pipeline for web and ecommerce

freeCodeCamp — Wed, 13 Nov 2019 08:00:00 +0000

By Anton Garcia Diaz

From ffmpeg and cloud video transcoding to HLS, delivery, players, Video.js, and analytics.

After the conquest of social networks, video is spreading through web businesses. As a media consultant working for several of the largest fashion ecommerce sites in the world, I feel safe saying the video-everywhere trend is all but unstoppable.

In this post, I review the main aspects to consider when publishing short-format videos in a web workflow. I comment about open source resources that make an in-house solution possible for each step, like ffmpeg or Video.js. Besides, I use an example with Abraia's video optimization and publishing demo - specially tailored to short videos for fashion ecommerce.

https://store.abraia.me/05bf471cbb3f9fa9ed785718e6f60e28/Short-Video-Publishing-Demo/Workflow/index.html

It gives full access to the resources created: chunks, playlists, and html code for the video player. This brings quick insights on the inner workings of a complete pipeline.

The content should be helpful either to pursue an in-house processing and publishing pipeline or to sort out the best combination of services.

There are two main concerns that are closely linked. The fear of bloating the bandwidth of users, which damages UX and engagement, and the fear of delivering poor visual quality, which potentially damages brand image.

The balance between these two antagonising factors is what determines the QoE. Keeping a high QoE requires delivering nearly the best possible quality, without rebuffering or stalling effects or noticeable drops of page speed.

Of course, there are other issues that matter

the customisation of the viewing experience to match the branding of the business
the cost increase of delivering higher bandwidth content
and the additional burden in terms of devops

...just to name a few.

A first choice: progressive vs adaptive bitrate (ABR).

Regarding video format selection, there are two main options with important implications: progressive video and ABR.

Progressive videos may be delivered and consumed like images, using plain HTML5 code. Moreover, progressive mp4 videos with H264 encoding have universal support across browsers and systems. So, they're the straightforward approach.

However, in the likely event that QoE is a main concern we should go for ABR. More specifically for HLS -again with H264 encoding – which is a broadly supported option.

With HLS we'll be able, in most cases, to keep the bits per second - the bitrate - of the video within the connection capacity limits. This avoids rebuffering, stalling, or blocking other content. In HLS, the video is available at different bitrates and is split in pieces. This allows the client to request the best quality affordable, based on the network speed at any time. The only caveat is that we'll need to use a player in our front-end (basically a piece of JavaScript). In apps, it's easier because both iOS and Android feature native support for the protocol.

The pipeline and the workflow

That said, let's see what a video optimization and delivery pipeline for web entails. The pipeline is supposed to process a master or pristine video with a high quality and make it suited to the web. It's also supposed to meet brand requirements on visualization, and to integrate the video events in the analytics of the site.

In sum, our pipeline should address the following problems:

Content management
Transcoding and optimization
Delivery
Visualization
Analytics

In the end, the pipeline should allow a workflow similar to that of social video platforms - where you upload a video and get a link like this to embed or share elsewhere - but under all the custom requirements of our business.

To keep this post short and focused, I'll skip the content management issue, which is basically the way we handle all the resources, including the collaborative media editing and approval workflows. Next, I go through the main optimization and delivery ingredients found in a video publishing pipeline.

Transcoding and optimization

For progressive videos to be responsive, we can create versions with different resolution and quality to be consumed based on breakpoints, similar to images.

In an in-house scheme this operation can be easily accomplished with ffmpeg. It's an open source tool that performs resizing, compression, and many other operations very efficiently. For instance, to scale a 4K video to fullHD with good visual quality you may simply use:

ffmpeg -y -i input.mp4 -vf scale=1920:-2 -c:v libx264 -crf 22 -profile:v high -pix_fmt yuv420p -color_primaries 1 -color_trc 1 -colorspace 1 -movflags +faststart -an output.mp4

Alternatively, with a cloud platform the operation should be a no brainer, although in many cases we loose effective control of the quality settings and possible breakpoints.

Encoding for HLS is a bit trickier. First, we have to define a coding ladder. Each step of the ladder will feature a different bitrate, from a maximum to a minimum. They set respectively the maximum and minimum quality.

For each bitrate in the ladder, we also have to set the resolution, again from maximum to minimum. Ideally, we should use bitrates specifically tuned to the video content to optimize the use of bandwidth. When done automatically, on a per video basis, this is called per title encoding.

We have to code the video with the resolutions and bitrates defined and then cut each rendition in chunks. We also have to decide the duration of the chunk. That is, how frequently is HLS renegotiating the quality to request, based on the current network speed. We can do all of the encoding with ffmpeg or with a cloud service.

Let's see the files generated for our example. We have a folder containing all the chunks (.ts extension), and the playlists ( .m3u8 extension).

The playlists contain all the information about the renditions available. Next, we can see the content of the master playlist: the ladder - bitrates and resolutions - and the relative route to the renditions.

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=3374012,RESOLUTION=1920x1080
1080p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1836580,RESOLUTION=1280x720
720p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1002050,RESOLUTION=856x480
480p.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=649329,RESOLUTION=640x360
360p.m3u8

That is, for each rendition we have an additional playlist containing the information about the duration and route to the corresponding chunks. We also need a poster to use as thumbnail and get covered in the event of a very slow connection or HLS compatibility issues. In our example, all the resources are in the same folder so the route to each resource is simply the name.

Delivery

Videos should be delivered through a CDN. If you make a poor transcoding, many users may suffer slow page loads. But at least if you use a CDN you won't take your site down because the server is unable to handle the load. I've seen big sites that more than doubled their peak traffic the day they decided to use videos in their home page. So videos, whether progressive or HLS, should be delivered as static files cached and delivered by a CDN.

If you are using a cloud platform for video publishing, you should be covered. Any decent one offers video delivery through at least one CDN. If you need coverage in some countries like China, you need to look into each specific platform and the CDN used, since some of them do not work there.

Visualization

While for progressive videos HTML5 is enough to ensure visualization, in the case of HLS we need a JavaScript player with HLS support.

There are many commercial options, but there are also open source alternatives with very high quality. A good example is Video.js. It has a wide support among browsers, only limited by the dependency on the Media Source Extensions API. It brings a high degree of customization using skins and a flexible configuration, for instance allowing you to use autoplay or different video controls.

The player may be inserted in the page code, or it can be in an html static that is embedded as an iframe.

Going back to our example, when we publish the video we create an html resource that has a Video.js player with default settings. The content url should point to the master playlist and the thumbnail to the poster image extracted from the video.

In this case, the html resource also adds oembed compatibility. Besides directly access in the browser this html - or a different one in which we copy/paste the player's code - to play the video, we can embed it in a content management system (CMS). For instance, when writing this post for freeCodeCamp.

https://store.abraia.me/05bf471cbb3f9fa9ed785718e6f60e28/Short-Video-Publishing-Demo/Embedding/index.html

Analytics

In short videos, typical analytics of interest are the ratio of users that play the video, the ratio of those who view it in full, or the ratio of playback failures.

Again, there are many commercial options available. However in many cases a widespread free option like Google Analytics (GA) may be enough. If we are using Video.js, we only have to instrument the html resource with GA, like for any other web page. Going back to our example, we can see it in the editable HTML created.

To track the video use in GA, we just have to track the video events in the player. For instance:

    player.analytics({
      defaultVideoCategory: 'Video',
      events: [{
        name: 'play',
        label: 'Video-Freecodecamp',
        action: 'play',
      }, {
        name: 'pause',
        label: 'Video-Freecodecamp',
        action: 'pause',
      }, {
        name: 'ended',
        label: 'Video-Freecodecamp',
        action: 'ended',
      }, {
        name: 'error',
        label: 'Video-Freecodecamp',
        action: 'error',
      }]
    });

Then, in GA we can see the events taking place. This screenshot shows my own real-time activity - with two devices and browsers - on the video example created for this post.

Summary

I have reviewed the main aspects involved in a video publishing pipeline, from transcoding, to delivery, visualization, and analytics. I have made reference to potential use of different resources, including two prominent open source initiatives like ffmpeg and Video.js.

I have supported the explanation with a simple example using our video publishing demo. It gives full access to the resources created. You'll be able to download, modify, and use the resources in your tests. You can freely use it to repeat the process with a short video of your choice.

Remember to start with a high quality video. The example here is based on a 9 seconds 4k video from @cottonbro. Overall, I expect the post to bring a bird's eye view of what a custom deployment for video publishing entails.

Video formats for the web, a short guide to help you choose

freeCodeCamp — Fri, 05 Jul 2019 08:45:55 +0000

By Anton Garcia Diaz

Video in the web will be on the rise for long. While embedding Instagram and Youtube videos is simple, there are more and more situations -like many ecommerce use cases- demanding a custom approach to video delivery.

When it comes to setting a video processing and delivery pipeline, the first decision to take is about the video formats to serve. Aspects like UX, support (browsers and systems), compression efficiency, or coding speed are likely to be relevant to this choice.

Based on my experience on media optimization for web businesses, I try to highlight here the main aspects to consider. If you are looking for a simple transcoding and optimization option using ffmpeg you may also check this article.

Containers and codecs

In contrast to usual image formats, it is really important to be aware of the difference between container and coding standard. The file extension tells us which container, but not which codec is being used. And the standard followed to encode the clip will determine if it is supported by the browser or the system.

For instance, while the universally supported video format for web uses a mp4 container and the H264 standard for encoding, not every mp4 file is universally supported since it may be coded under a different standard, like H265.

It even gets a bit more complex with adaptive bit rate (ABR), which brings the best way to be responsive to the network and device capabilities of the user.

Let’s see the main combinations of containers and coding and delivering standards and the differences between them in terms of support, compression efficiency, encoding speed, and user experience.

Progressive video

H264/AVC

The king format for video features a mp4 container with H264/AVC encoding. Sometimes you’ll find it in a m4v container (default format in Handbrake), an mp4 derivative developed by Apple for H264 videos with DRM protection.

Every browsers and systems -also native applications in both iOS and Android- support this format. It is the safe choice to avoid compatibility issues.

Moreover, almost any device from desktop to mobile, features support for hardware acceleration for H264. It’s fast to encode and decode.

In sum, encoding and delivering this format is really easy. Like for images, you can simply insert the link to the video using HTML5 and it will work with any browser out there.

The problems may appear for resolutions over VGA, good visual quality -bitrates about 2000 kbps and over- and a duration over several seconds. When viewed through a mobile network -in many regions also in home connections during peak hours- it may suffer stalls and rebufering. The alternative of reducing the quality will produce artifacts like blur, mosquito, or blockiness.

H265/HEVC

Using the same container and H265/HEVC encoding we find a powerful video format that yield much higher compression efficiency (about 50% lighter) and much less risk of artifacts other than blur. The problem of this format is that support is limited to Apple devices, which include the hefty royalties in their price. Almost only Safari and iOS apps will be able to use it. If you have many iPhone or Mac users you can include it with a fallback to H264. The experience for them will be better.

Even with hardware acceleration -available almost only in Apple devices- the higher complexity of this format means that encoding is significantly slower, so producing the variants for delivery takes more computing and more time.

VP9

This is the open source royalty free reply from Google. Instead of mp4 it uses webm containers, basically a mkv container but setting the coding standard to VP8 or VP9. In brings similar benefits to H265, perhaps a bit less efficient but still much more compared to H264. Again, it allows to reduce the weight with much less risk of artifacts other than blur. The encoding speed is similar to H265, which is slow. The encoding speed may be something to bear in mind, specially in an inhouse transcoding pipeline.

Notice that while a previous version (VP8) exists with same support, we don't recommend it at all, since it does not add any benefits to H264, which is already universally supported. The use of webm is only justified with VP9 encoding.

Of course, support for webm is limited to Google world. That means Chrome and Android. Again, we’ll need a fallback to H264.

AV1

A first stable version of this standard was released in march 2018, with mappings for both MP4 and MKV containers. It delivers similar or slightly higher gains in compression efficiency compared to H265, while being license free. The last implementations have also improved the decoding speed compared to H265, making AV1 videos a compelling alternative for web delivery.

The partners involved in the Alliance for Open Media that created the format make the case for widespread support in the near future. It promises sweeping all the other formats out there.

Partners of the Alliance for Open Media behind AV1

However, the implementation currently available should be still considered experimental and its bottleneck is still encoding speed. The lack of hardware acceleration for this operation is clearly an issue, with the first solutions expected for the end of the year.

VVC

The comittee responsible for H264 AVC and H265 HEVC has fast tracked a new standard, with a release expected for 2020. Preliminary tests on the approaches currently considered have shown remarkable gains compared to H265 and AV1. I include it here as a futuristic notice, just to show that the video coding race seems far from over.

Adaptive bitrate (ABR)

This is a very interesting alternative to any progressive format. It builds upon a HTTP-based media streaming communications protocol. In this approach videos are delivered as a master playlist. The playlist offers a representation or ladder, with different options of resolution and bitrate that cater to different viewport sizes, network bandwidths, and device capabilities.

Moreover, videos are split into pieces or chunks so that the client may jump from one quality level to other. It is able to adapt to the conditions of the user, namely to the network speed but also to the viewport size -like switching to full screen-.

ABR brings a big advantage to optimize UX for mobile devices, avoiding stalls or re-buffering events under mobile networks. If you seek a true responsive behavior, this is clearly the approach to take. There are two main standards, HLS and MPEG-DASH.

Although there is an extended belief that ABR only makes sense for quite long videos, in my experience many situations with fairly short clips can also benefit from this approach.

HLS

Developed by Apple, this ABR protocol relies on different renditions split in chunks in mp4 format. Originally with H264, it also supports H265 now. However, as a compromise we would recommend to stick to the H264 encoding with HLS since it brings much better compatibility across a variety of client cases.

A big point of this standard is support in recent Apple devices. For clients other than Safari or native iOS applications, you’ll need a viewer. But this is not a big problem since good open source options like videojs are available out there. Or course, you’ll need some effort to customise it and put it to work in your front-end. There are also great transcoding and delivery services doing all this work for you.

Since each rendition should be encoded at constant bitrate, I recommend combining HLS with per title encoding. That is, selecting the rendition bitrate based on the content of the video.

MPEG-DASH

This is a codec-agnostic protocol for ABR, so it is capable to also work with VP9 encoding besides H264 and H265, or even new alternatives like AV1. The downside is its relative youth, which means enjoying much less support than HLS. This is why we don’t recommend it yet for most web businesses -even large ecommerce stores-.

Summary

After years of predominance of H264 AVC compression, new approaches are animating the scene. The race on display sizes and resolutions is fueling the development of new formats capable to deliver greater content within the same bandwidth.

VP9 in webm provides a significant gain in compression eficiency (about 30%), is royalty free and is supported by Google solutions (Chrome, Android). Going much further, H265/HEVC has achieved a comparable or better subjective quality at half the bit rate compared to H264. Since none of them features universal support, H264 will be still needed, at least as a fallback.

Adaptive bit rate is a compelling alternative, providing unbeatable user experience. In this regard, HLS enjoys a wide support with the help of open source viewers. It is possibly the best option for a medium sized web. The complexity added by the need of a viewer is fairly mitigated by the availability of open source initiatives like videojs for inhouse solutions, but also of third party services to do the job at competitive prices. If you go through this last route, make sure to ask for per-title encoding.

video - freeCodeCamp.org

The WebCodecs Handbook: Native Video Processing in the Browser

Table of Contents

Prerequisites

Primer on Video Processing

Video Frames

Codecs

Removing detail

Encoding frame differences

Encoding & Decoding

Containers

What is WebCodecs?

Before WebCodecs

Core API

VideoFrame

EncodedVideoChunk

VideoEncoder

VideoDecoder

Putting it all together

Muxing and Demuxing

Demuxing

Muxing

Building a Video Converter Utility

Transcoding

Transformations

Transform Pipeline

Complete Demo

Production Concerns

Codecs

h264 (for mp4 files)

vp9 (for webm files)

Bit rate

GPU vs CPU

Memory

When VideoFrames are generated

How to remove VideoFrames

Further Resources

How to Convert Video Files to a Gif in Python

How to Convert Video to a Gif in Python

HTML Video – How to Embed a Video Player with the HTML 5 Video Tag

Table of Contents

Basic Syntax

Attributes of the Tag

The src Attribute

The poster Attribute

The controls Attribute

The loop Attribute

The autoplay Attribute

The width and height Attributes

The muted Attribute

The preload Attribute

Conclusion

Android Camera2 – How to Use the Camera2 API to Take Photos and Videos

What is Camera2?

Camera2 Use Cases

How to Set up the Camera2 API Components

Camera2 TextureView Component

Conclusion

How to Setup Instagram-like Video Stories in Your App

Coming Back to the Problem Statement

Multiple layers in one view

Now Back to Caching

Load Videos Asynchronously

Saving Videos in Device

Other use cases for caching

Ensure Optimum Storage

Remove Deprecated Videos

Maintain a limited number of videos

Measure Impact

Wrapping up

HTML5 Video: How to Embed Video in Your HTML

attributes

controls

autoplay

poster

Videos can be expensive

HLS Video Streaming: What it is, and When to Use it

What is HLS and how does it work?

When should we use HLS?

How can we use HLS in our sites?

The `src` Attribute

The `poster` Attribute

The `controls` Attribute

The `loop` Attribute

The `autoplay` Attribute

The `width` and `height` Attributes

The `muted` Attribute

The `preload` Attribute

`controls`

`autoplay`

`poster`