A 4 min 1080p30fps video taken with my phone camera is 518MB, While a 12 min 1080p30fps video ripped from youtube is 341MB, both are using mp4 h.264 as codec and the youtube one isnt of lower quality, so why this big difference?
A 4 min 1080p30fps video taken with my phone camera is 518MB, While a 12 min 1080p30fps video ripped from youtube is 341MB, both are using mp4 h.264 as codec and the youtube one isnt of lower quality, so why this big difference?
Well yes, they have a lower bitrate, but why same quality as a bigger bitrate video? Same resolution, codec, frame rate…
Because they aren’t the same quality? Lossy compression is lossy
https://m.youtube.com/watch?v=r6Rp-uo6HmI
https://m.youtube.com/watch?v=h9j89L8eQQk
Because you can compress video without reducing the resolution, codec, or frame rate. When a camera records two green pixels: it records (green pixel) (green pixel). When the video is compressed, it changes to (two green pixels) which takes up less storage space but retains the same information. Compression is computationally expensive, which is why cameras typically don’t do it on the fly.
It’ the individual frames that are compressed, essentially the video is unpacked and detail is culled from averages across multiple other frames beside it. So if the top of the video, for example the sky, doesn’t change then that part will be kept static.
It’s not so much properties about the video, but properties about each frame. I can take a 1080p image and blow it up to 8K in GIMP, but it’s got the same detail as a 1080p image.
If you do multiple passes you can alleviate some of the downsides of low bitrates. You can always easily spot it in dark areas. I despise watching space movies or shows on streaming services because of the resulting excessive banding artifacts.
Video encoding has several tradeoffs:
The cell phone encoding chips for video encoding on device make sacrifices to preserve speed of encoding and preserve battery life (higher computational complexity costs more processing cycles and tends to use more power). So it’s simpler encoding, in exchange for inefficient bitrate compression.
YouTube (and all the social media sites) have huge server farms with highly specialized encoding chips for making the videos more efficient with bitrate for quality. That makes sense because videos designed to be watched millions of times could benefit from even a very slight improvement of bitrate in exchange for a one-time cost of complex encoding. It’s also why YouTube tends not to convert to AV1 (very efficient in bitrate for quality, but computationally complex to encode) until a video has a few hundred views, because it’s not clear whether that tradeoff is worth it until they know a lot of people will be watching it.
Netflix customizes even further for a per-video basis and looks for even more specialized tricks on a scene-by-scene basis, because every single one of its videos only needs to be encoded once for each quality/format but will be watched millions of times.
In other words, it’s like any other engineering problem. The engineers choose different tradeoffs based on context, which means that the cell phone applies a different set of tradeoffs compared to the social media site’s server farm.