Navigation

April 7, 2012

LumoLabs: Nikon D800 video function demystified

Nikon D800 FX mode 1080p video frame (click for original size)
The Nikon D800 full frame SLR camera has created a lot of buzz recently. Some would call it hype. While it is clear that its 36 MP still resolution is pretty much unparalled in the 35mm camera class, the final verdict about its video subsystem is still out. Esp. in comparison with Canon's 5DmkIII.

One point of interest has been how either camera actually creates its video frames. I now had a chance to apply LumoLabs' testing methology to a loaner D800 camera and figure it out for 1080p video in FX mode. I am having a look at live view performance too.

You may jump to the conclusion at the end if you just want to read what we found, igoring how we did it :)


Nikon D800 FX mode FullHD 1080p video

The title image shows one frame from a 1080p video taken with the Nikon D800 (in FX mode, it supports a number of crop video modes too). It shows a zone plate test chart which can be used to perform a sampling error frequency analysis.

Please, read falklumo.blogspot.de/2009/10/lumolab-welcome-and-testing-methodology.html to learn more about the testing methodology incl. access to the original of the test chart allowing everybody to replicate my analysis.

There is a bit of (gray colored) moiré from the printing process. This is because scaling and printing of zone plates is a non-trivial art in itself ;) You can actually measure the printer's native resolution by inspecting the printed zone plate chart. Below, you find a photograph of the print (in 14.6 MP resolution) allowing you to determine what moiré patterns are from the printing process actually.

Printed zone plate chart (still shot with a 14.6 MP camera, for reference)

However, all colorful moiré patterns are artefacts introduced by the D800 video system. It allows us to precisely measure how it works. Let's have a close look at the one of the two center discs:

Analyzed region of interest in the D800 video frame

The big discs are constructed such that the 1080p Nyquist frequency emerges at its outer circle. The two center discs have their edge at twice this Nyquist frequency and the four tiny discs at four times this frequency. Therefore, the false color moiré disc emerges at (149px/258px x2) or 1.155x the 1080p Nyquist frequency (1247 px). This means that the Nikon D800 samples ~1247 horizontal lines from its sensor.

Now, let's make a back-of-the envelope calculation:

An FX frame in video mode is taken from a 6720 x 3780 px region (which actually is a 1.095x crop from the full 7360 x 4912 px frame (this information is from the Nikon user guide, translating physical dimensions into pixels). Because 3780 / 1247 = 3.03 and because 1% is our measurement error, we have proof that the Nikon D800 samples every third horizontal line from its sensor.

A second result is that the ever so slightly color moiré for horizontal frequencies disappears at the Nyquist frequency. The D800's AA filter is effective here, the remaining moiré is from the printing. The D800E would have a bit of additional color moiré here, but by far not as strong as in the vertical direction. So, I believe that the Nikon D800 samples every vertical row from its sensor.

Below is what I believe how Nikon implemented line skipping:

Likely D800 sensel sampling matrix

and here is a slightly more symmetrical scheme which I cannot entirely exclude although I think it isn't used in this mode:
Unlikely sensel sampling matrix
If you look at the likely sensel sampling matrix, you'll see that all sensels which are read out (the ones with a color) result in a new RGGB Bayer matrix of sensels. Which has the advantage that a standard demosaicing algorithm is applicable to create an RGB frame.

This is similiar to what the Canon 5DmkII did actually. However, there is one important aspect where the D800 is different:

A native 1080p video frame is 6720 x 1260 px, demosaiced to a 2240 x 1260 px RGB frame.

And the final 1080p video frame is further downsampled 7:6 to 1960 x 1080 px which gives the D800 a slight edge in resolution and edge flicker behaviour over a 5DmkII.


High ISO noise in video

What we found has one important consequence: High ISO noise in video! Because of the FX video crop and skipping two thirds of sensels, the ISO performance in video is shifted by a factor 3.60. E.g., At ISO 12,800, the noise looks (as bad) as at ISO 46,000 from a camera using all available sensors for video (except for the 16:9 ratio crop of course).

You may note however, that the D800 still samples 6720 x 1260 sensels for a 1920 x 1080 frame or 4.08 sensels per pixel. For this reason, at ISO 12,800, the noise looks (as good) as at ISO 3,200 from a still image when pixel peeping at a 100% (1:1) level. So, pixel noise in D800 video is 2 stops less compared to still while it could have been 3.85 stops less when reading out a maximum of sensels. If you consider this bad or good is up to you.

Below, I have extracted frames from the ISO comparison performed by crisislab.com:

Video noise comparison D800 vs. 5DmkIII -- original frames (c) 2012 crisislab.com
On the left hand stripe, I have shifted the D800 samples two stips down and I think, it is a good match for the 5DmkIII performance then.

From that, I can already conclude that the 5DmkIII reads out all its sensels, i.e., does no line skipping. However, I didn't run a resolution analysis for the 5DmkIII. However, hearing about resolution complaints for 5DmkIII video, I think they bin pixels before read out. This improves noise and aliasing performance but unlike downsampling, doesn't help the resolution.


Nikon D800 Live View implementation notes

I have applied our testing methodology to Nikon's live view implementation too.

D800 live view, photograph of the rear LCD (no zoom level)
You see the same false color moiré discs which we have analyzed already. Of course, there is some strong additional moiré from the LCD rasterization. I.e., the D800 only reads every third line when activating live view (in the example, it is FX video live view).

If we zoom in, we get a result as follows.

D800 live view, photograph of the rear LCD (high zoom level)
You now different false color moiré disc, they have moved outwards. The sampling frequency is  (1692px/1935px x2) or 1.749x the 1080p Nyquist frequency (1889 px). Because 3780 / 1889 = 2.00, we have proof that the Nikon D800 samples every second horizontal line from its sensor when zooming enough in live view.

In live view, the D800 switches from third line to second line skipping when zooming in!

Lessons for manual focusing: (1) zoom in and (2) focus onto vertical structures which have twice the resolution in live view! Focus on trees, edges of buildings rather than horizon or roof top.


Conclusion

The D800 creates FX 1080p video in the following way:
  1. Crop a region of 6720 x 3780 sensels (crop factor 1.095).
  2. Read only every third line out of this region, but all sensels in a line. The result is an 6720 x 1260 sensel RGGB Bayer pattern which can be demosaiced.
  3. The resulting 2240 x 1260 RGB image is downsampled 7:6 to the final 1920 x 1080 px resolution.
  4. Compared to an optimum architecture, only 1/3.6 of sensels are read which makes the D800 loose up to 1.8 stops in high ISO video performance.
  5. When zooming into a live view image, the D800 switches line skipping from 3x to 2x.
  6. Manual forcus should use zoomed live view focusing vertical edges.
Overall, I am personally pleased with the implementation Nikon has chosen. It refines an idea originally used in the 5DmkII which is more difficult to implement due to the higher overall number of pixels. Because of downsampling from 1260p to 1080p, I actually expect slightly better resolution than from a 5DmkII or a camera which bins sensels prior to demosaicing.

On the other hand, there will be no more excuses for line skipping in the future. Not after Nokia got rid of it in their 41 MP 808 mobile phone ...


Enjoy your read :)
Falk

14 comments:

  1. Falk, thanks for the excellent writeup. There are a couple of things I'm trying to grasp correctly. The one that's driving me the craziest is how video/live view can go from a read 1 skip 2 row routine to a read 1/skip 1 routine. It can't be that simple; if it were you would end up leaving out either R or B in the resulting Bayer matrix. As we discussed earlier a read 2/skip 2 row arrangement would generate a proper RGGB Bayer pattern but what effects would that have on resolution?

    I'm also wondering whether the d800 is performing some sort of pixel binning when running in video. The full frame readout rate of the camera is only 4 frames/second, that seems pretty slow to be able to handle 8mp worth of data at video rates.

    Thanks again for taking the time to do the experiment and do the article

    Richard

    ReplyDelete
    Replies
    1. Richard, thanks for our earlier conversation. To recap ...

      I am not sure if pixels in a line are binned. The video signal definitely resolves the single limiting 1080p circle along the x-axis (although at a very low contrast which is normal near Nyquist). So, if binning occurs, it would probably be binning of 2 sensels only. In the resoltion chart, it does however look like green resolves better than red or blue. However, the false color moiré discs look like coming from a standard demosaicing (unlike 5DmkII).

      There is another reason why I consider binning to be unlikely.

      As somebody pointed out to me, the column-parallel ADC in the Sony chip are of the slow "counting" type: E.g. at 400 MHz, they would need 2^14/400 MHz or 41 µs. This time is equal per line or per sensel as all sensels in a line are read out simultaneously. I don't know the clock speed but 41 µs is just fast enough (200 (133) ms per FX (DX) frame) to support 4 (6) fps. As the ADC is faster in 12 Bit mode, reading a 1.095x 12 Bit crop should be doable in 40 ms out of the box, enough for 24p. This is speculative. Just tweak numbers a little and you see how this could support 30p video.

      We're obviously close to some limits because otherwise, FX video wouldn't use an 1.095x crop to downsample from.

      BTW, I would have liked a "cinema" option which uses an 1.0x crop to downsample from (1380 rather than 1260 lines), only available at the lower 24p speed. The performance should be there at the expense of a somewhat larger rolling shutter effect.

      So, it probably isn't technically feasible to increase speed by dropping sensels from a line. So, why then should Nikon do binning? It would have to be done to the already digitized values and I have doubts that the Sony chip has an embedded adder.

      So, my guess is that binning is done in software as part of the demosaicing, binning the 6720x1260 Bayer pattern to a 3360x1260 Bayer pattern prior to demosaicing. Which would explain why LV zoom doesn't reveal more than half the horizontal sensor resolution. Binning before demosaicing can reduce color noise and is faster than the other way round.

      Delete
    2. Richard, about 2x line skipping in live view:

      Of course, they can't just read odd or even lines. They probably read two lines, then skip two lines.

      However, I am not sure if I have enough evidence to deduce this.

      If you look into the LV snapshot, you'll see that at around 1/2 sensor resolution (indicated by the inner tiny red lines) there is line-skipping moiré in the y-direction and resolved detail in the x-direction (at low contrast partly due to the fact that this is the zoom level's limit too).

      With a read-2, skip-2 lines scheme, I would expect mild (large) color moiré outside the zone disc at full resolution for the D800 (D800E), a luminance moiré at 1/2 frequency and an extinction of contrast disc around 1/4 frequency. Maybe, this is what we see indeed: the colors at 1/2 frequency seem to be a bit less striking and there are some gray circles at 1/4 frequency.

      I may provide an update if my loaner D800 returns to me :)

      Delete
  2. First of all, thank you for your scientific/rigorous analysis! I love reading well thought-out analyses such as yours.

    I do have a few questions though.

    (1) How do you demosaic 6720 x 1260 px to 2240 x 1260 px? That's dropping the horizontal resolution by 3 but not changing the vertical resolution... all the demosaicing algorithms I know about are concerned with interpolating R,G,B values for *every* input pixel. Just curious!

    (2) You say "However, hearing about resolution complaints for 5DmkIII video, I think they bin pixels before read out. This improves noise and aliasing performance but unlike downsampling, doesn't help the resolution."

    I'm a little confused re: this as aliasing results from undersampling a high resolution signal -- wouldn't simple binning essentially be 'undersampling'? Or would this sort of undersampling still be better than line-skipping?

    I guess binning would explain the relatively low-resolution output of 1080p on the 5DIII. Some say it's even softer than the D800 image, though I believe that could also be explained by the small (1.095x as you say) downsampling on the D800 vs. the 11x downsampling the 5DIII would have to perform on the full 22MP data... and the greater your downsampling factor, typically the more post-downsizing sharpening one needs (interestingly, the 5DIII footage holds up well to sharpening).

    Couldn't the poor 5DIII resolution also be explained by a very poor downsizing algorithm? Though binning does make sense... even the demosaicing will be less computationally intensive after binning, right?

    I'd be happy to do this test with your chart & my 5DIII if you think it'll help you arrive at any conclusions re: the 5DIII. I'm just very curious about these things in general!

    Cheers,
    Rishi

    ReplyDelete
    Replies
    1. Hi Rishi,

      thanks for your considerations.

      Ad (1)
      I expressed myself in a fuzzy way on purpose. What I meant was they start with a 6720x1260 Bayer pattern and end up with a 2240x1260 rgb pattern. So, they may do a demosaicing at 6720x1260 and then downsample (actually, they then would downsample straight to 1920x1080). Or they could first downsample lines (e.g., by adding four RGRG into two RG etc.) and only demosaice the resulting 3360x1260 Bayer pattern which is faster and probably just as good. There are a number of alternatives and I didn't want to speculate about the specifics as it doesn't really matter for video (would matter for zoomed LV though).

      Ad (2)
      A proper downsampling routine has no negative impact on resolution. Binning isn't undersampling, it is more like applying a low pass filter. However, if the binning happens at higher spatial frequencies than visible in HD video, there should still be no significant loss in resolution, just better aliasing and noise behaviour.

      Looking at tests like http://www.youtube.com/watch?v=Cxlc60KCSPQ , I see spurious resolution in both dimensions beyond the Nyquist freqency (marked 500 lp/PH in that test). Binning should have suppressed that more. However, there is no false color rainbow pattern which is an indication that at least the Bayer matrix isn't simply subsampled from.

      Again, proper downsampling only improves the image quality and resolution, look at the Panasonic GH2 or Nokia 808 for example. Therefore, I really don't know what Canon did in the 5DmkIII. Except that they didn't simply downsample from full frames which would have been the obvious thing to do if wanting to be a leader.

      Delete
    2. Note:
      downsampling = rescaling using ALL pixels to create a smaller image with better pixel quality.
      subsampling = rescaling using a SUBSET of pixels to create a smaller image no better pixel quality.

      Delete
    3. Simple binning is just averaging of pixels, right (add the signal, divide by # of pixels you added). I take it that's less computationally intensive than downsampling algorithms?

      Maybe Canon is doing some combination of the two?

      I doubt they implemented hardware binning where you actually average pixels on the chip before readout occurs... that'd require actual changes to the sensor chip... like the Phase One Sensor+ technology.

      If they simply subsampled, there'd probably be more aliasing artifacts, which you've mentioned.

      I guess what I'm asking is: does binning (simple averaging of pixels?) reduce chances of aliasing/moire, and reduce noise, while *not* retaining detail as well as a proper downsampling algorithm (e.g. bilinear) would? The latter being too computationally intensive for the 5D Mark III?

      Thanks!
      Rishi

      Delete
    4. Hi again. I think what hasn't become clear enough: binning after demosaicing and downsampling using a kernel like 'bilinear' is the same for all practical purposes. In-camera signal processors should handle all of this well enough.

      It really doesn't make sense to assume that Canon reads out all pixels all the way down to full scale rgb frames only to mess it up in the last easy scaling stage. I'd rather assume that the binning is on-chip to reduce the requirement on off-chip bandwidth.

      But I won't speculate any further about the 5DmkIII in a Nikon article.

      Delete
    5. Thanks again for your reply.

      Your analysis of the Live View mode on the D800 makes me wonder: do you think contrast detection autofocus is negatively affected by the line-skipping in Live View mode when taking stills?

      Thanks,
      Rishi

      Delete
  3. Falk, thank you for your research and conclusions. I've used some of your findings in one of the pages from my website on video with the D800: http://photo.vanderkolk.info/photo-nikon-d800-moire-antialiasing-artifacts.php

    ReplyDelete
    Replies
    1. Ron, thank you for a great article and the backlink. However, may I make a small suggestion? You write "All DSLR cameras suffer from moire with video recording, because the sensor image is downsampled to video resolution". However, I think this is a misleading statement. Downsampling is actually improving the video quality, such as it happens in the Nokia 808 phone camera.

      You probably wanted to say "subsampling" where the difference is that "downsampling" is using all pixels while "subsampling" is only using a (small) subset of available pixels.

      Delete
  4. Hi,

    I've done a similar test on D7000 and it appears to sample 1920 horizontal lines for 1080P. That's equavalent to 1.44 from original resolution. Since I'm using a full screen 1080P and a wide angle lens with significant distortion, I guess that's sampling 2 lines out of 3 for D7000 in liveview and movie mode. Once zoomed in, it will do pixel by pixel scanning.


    Jack

    ReplyDelete
    Replies
    1. Hi Jack,

      thanks for your comment.
      BTW, I am a big fan of your blog, liked your H-Alpha conversion article a lot :)
      A sampling frequency of 1/1.44x or 2/3x of Nyquist is a bit strange though. Interesting find! Thanks for sharing.

      Delete
  5. Hello,

    I was wondering if this applies to uncompressed HDMI footage sent to an external recorder like Atomos? Would noise be less in that case for the D800?

    Thanks,
    Huy

    ReplyDelete

Please if posting anonymously, choose a nickname for your post. Thanks.