National Park Time Lapse – Tranquility

Since my last astrophotography road trip in California two and half years ago, I really haven’t spent anytime writing on travel and photography. Amidst the camera project and my PhD, I somehow have accumulated a pile of decent photography yet to be processed or released. But anyway, all those hard work serves to produce better images wouldn’t they. So I took a break in the previous weeks to finish off some of those leftover photo work.

Please enjoy my second time lapse compilation – Tranquility

Included are some of the time lapses I took in Big Bend NP, Mojave National Preserve, Death Valley, Picture Rock Lakeshore and Shenandoah NP. Then there’s also the Jiuzhai Valley in Sichuan, China!

In terms of astrophotography, I only got a few left in hard drive for release. The road trips I cover recently were on the east coast. With light pollution and bad weather along the way, there really weren’t many stars to be seen. Let alone for deep space imaging.

Cygnus

Wide Field Milky Way Center shot in Death Valley

As for 360 panorama, it becomes a routine for me now as the pipeline for 3×6 stitching is well established. In the meantime I start to incorparate the floor image in the stitching process.

Carlsbad CavernsThe WindowWhite SandBig BendPorcupine MountainTybee Island LighthouseShenandoahDeath Valley

Mouse over for location, Click for 360 View

The link to my first time laspe compilation is here:

Sony a7S III has a 2×2 pixel binning IMX510 BSI sensor

In 2017 ChipMod took a microscopic image of the IMX235 sensor from a Sony a7S II camera, showing its very large opening of pixel photodiode. Then in early 2020 we successfully interfaced with our custom FPGA board. Now with the third generation released, we want to know if the BSI model has improved the image quality.

在2017年,我和ChipMod获取到Sony a7S II相机里IMX235传感器的显微照片,看到其像素光电二极管相当大的开窗尺寸。2020前期我们成功在自己的FPGA开发板上与之接通成功。如今新的第三代已经发布,我们想知道背照式是否真的提升画质。

Read Noise

Read noise in ADU chart from photonstophotos.net

Judging from the read noise chart, the third generation BSI performs half to a full EV worse compared to the second generation FSI sensor. The performance only picks up slightly better after a7S III switches to its high gain mode after ISO 1600.

从读取噪音来看,第三代的背照反而比第二代正照传感器要高上0.5~1个EV。直到ISO 1600以后,第三代开启高增益模式后才稍微比二代好一些。

DXOMark

Then again the DXOmark shows it performs a third stop worse compared to its second gen. Please note that at 18% raw gray scale the noise is pretty much dominated by photon fluctuation or shot noise. Thus SNR18 usually reflects the total quantum efficiency. From the chart we can see the IMX235 FSI performs almost as good as a IMX451 BSI sensor from a7R IV when viewing from the same size. This shows how amazing a large pixel can be optimized to maximized light collection. Now the question is what happened underneath the pixels to cause such a step back in imaging quality.

然后DXOmark的数据也显示三代比二代在18%灰度信噪比上性能也逊色1/3档。请注意在18%的灰度下,主要噪音来源是光子涨落噪音,也就是散粒噪音,这个性能指标主要表现的是传感器的总和量子效率。从图表来看,正照的IMX235性能几乎接近背照的a7R IV IMX451。可见优良的设计可以让大像素充分接收光子。现在的问题是,到底是什么像素设计导致了三代反而在画质上落后于前一代。

We recently got an a7S III sensor module damaged from laser light show. It came with flex cables and an image stabilization (IS) platform. In the image below, the right two flex cables are connected to the sensor module. These contains CMOS Imaging Sensor (CIS) power supply, driving signals and 8 pairs of SLVS-EC pixel data channels. The leftmost cable controls the coils for IS module.

近期我们获得了一个被激光秀损伤的a7S III的传感器模组。模组已经带了软排线和图像稳定平台。在下面的照片中,右侧两条排线连接图像传感器模组。里面包括了CMOS传感器供电、驱动信号和8对SLVS-EC图像数据通道。最左侧的排线控制着防抖系统的线圈。

A7Siii_module_back

A7Siii_module_front

We then removed the sensor module from the IS platform. These flex cables are high density 0.2mm pitch. Unlike previous Sony CIS modules, this one no longer has a center PCB cutout for direct thermal relief to the metal plate. We have to remove the PCB using a rework station to reveal the CIS part number.

我们稍后从防抖平台拆下传感器模组。这些排线为0.2mm引脚间距的高密度线。和之前的索尼传感器模组不同,这次模组没有电路板的中央散热开口。我们只能想办法用返修台移除电路板来看CMOS型号了。

IS-1036

A7Siii_CIS_removed

The CIS part number is IMX510AQL in a 294-pin LGA ceramic package. The same package shared with a7R IV’s IMX451AQL 61MP sensor. But judging from the PCB traces the pinout definition is different. There were rumors saying IMX510 was going to be a new 32MP APS-C sensor. This just shows the leaked specs without proof are so unreliable and inaccurate.

主传感器的型号是IMX510AQL。芯片采用294引脚的LGA陶瓷封装。该封装与a7R IV的IMX451相同,但是从电路板走线看采用了不同的引脚定义。之前有谣言说510是一颗新的APS-C画幅3千2百万像素传感器,看来此类没有对证的所谓泄露出的性能参数是多么的不靠谱不准确。

IMX510AQL_identified

Let’s heat remove the cover glass and inspect the pixels under a 50x microscope objective. It turns out this sensor was a 2×2 binning design. This means IMX510 actually has a 48MP native resolution. The RGGB Bayer pattern is spread across a 4×4 grid. After sensor readout, the four pixels in each pf the same color are then combined digitally to give one pixel before sending out on the SLVS-EC interface. This could explain the increase in read noise. From my knowledge, none of Sony DSLR CIS supports charge binning due to limitation in its pixel architecture. By combining four pixels digitally, you would increase the noise variance by four and hence read noise almost doubles (sqrt to RMS). The bright green pixels are phase detection pixel for hybrid AF system.

我们通过高温拆除了传感器的封装玻璃,然后用50x显微镜物镜检查像素。出乎意料,这个CIS居然用的是2×2 binning像素。也就是说IMX510本身是一个四千八百万像素的传感器。RGGB的Bayer阵列散布在4×4间隔上。在每个像素读出后,同颜色的4个像素在数字相加后再由SLVS-EC接口输出。这就可以解释为什么读取噪音增加了。由于Sony单反传感器的像素架构不具备电荷域binning的能力,读出后数字相加导致方差变4倍,读取噪音均方根为2倍。亮的绿色像素是自动对焦系统的相位检测像素。

A7S3-50xIMX235 50x

IMX510 (top) 2×2 binning pixel vs FSI single pixel in IMX235 (bottom), both under 50x objective

IMX235 has Bayer and microlens removed showing its top metal layer

IMX510上图中2×2像素,对比下图中IMX235正照像素,均用50倍物镜拍摄

IMX235的Bayer层和微透镜被移除,可见顶层金属层

So the final question is why Sony went down this design path. I came up with two possible reasons. 1. Sony already has a BSI pixel design fitting this 4.2um pitch requirement. A 2×2 binning is a lot faster to reach market then starting off a new 8.4um pixel. Since most pixel design layouts are fixed, scaling the area can make multiple sizes of chip. For example, IMX411, IMX461, IMX455, IMX571 and IMX533 are all based on the same 3.76um BSI pixel design but each cover a different imaging circle from medium format to 1-inch. 2. Sony try to emphasis the HDR video capability on a7S III. A single pixel is limited in dynamic range. But you could read out each of the four sub-pixels with a different gain or exposure time. And later weight combining them digitally for the final value. Such method is used on many Sony security sensors. IMX294 and IMX482 also employ 2×2 binning BSI design.

至于Sony为何采用这样的设计,我认为可能有两点原因。1. 索尼已经有一个4.2um宽度的背照像素设计。重新设计一个8.4um像素需要时间,而直接采用2×2 binning方式能够更快到达市场。大部分的Sony CIS设计沿用相同的像素架构,通过改变总像素来实现不同大小的芯片。比如IMX411, IMX461, IMX455, IMX571, IMX533这些沿用相同的3.76微米背照像素架构,但画幅满足从最大的中画幅到最小的1英寸画幅。2. a7S III主要强调HDR摄影能力。单个像素能做到的动态范围是有限的。而拆分4个亚像素后,可以通过不同曝光或不同增益读出,后期数字处理来增加动态范围。这种方法在一些Sony的监控CIS被用到,例如IMX294和IMX482。

Regardless of imaging quality, the third gen has a huge improvement in readout speed due to its BSI architecture. After all, this camera is mainly aimed for cinematographers. Its all-pixel scan rate has drastically increased from 30FPS to 90FPS. And 1080P60 no longer needs subsampling like in IMX235. Engineering has always been a balancing act. But still, it would be great to see a single large BSI pixel without microlenses achieving sCMOS grade quantum efficiency.

不过无论画质,第三代通过背照工艺实现了读取速度性能的极值提升。毕竟a7s III主要的目标是视频摄影,它的全幅读取速度从之前的30FPS提升到了90,而且1080P60模式也不需要像二代IMX235那样进行亚采样。工程永远是一个取舍的过程,不过不论如何,还是希望能看到一颗超大背照像素,不通过微透镜就能实现科研CMOS传感器那样的量子效率。

Decoding the SLVS-EC protocol from IMX410BQT

In my D850 hacking post I had mentioned another Sony sensor IMX410BQT. It bares a similar IC packaging like the IMX309AQJ. The connector shared the identical pinout. We suspect it must be coming from the Nikon D780. Recently the ChipMod workshop had a laser damaged sensor from a Z6 mirrorless camera, with the sensor baring IMX410BQJ marking code.

IMX410BQJ version (Nikon Z6) on the left and BQT (presumably D780) on the right

The PCB layout is 100% identical in between. On the J package, a molded plastic frame facilitates mounting to a optical stabilization device.

We decided to plug in the BQT version on to the Z6 camera and it works!!!

We prove both packaging are electrically and functionally identical

Signaling

The only difference to the IMX309 resides on the high speed data outputs. IMX410 only uses SLVS lane 0~7. The DDR clock pins and data lanes 9~15 PN pins are all shorted to ground instead. Lane 8 is connected on the sensor PCB side. But its negative pin is shorted to the ground and its positive pin left floating on the Z6 flex cable. Based on these, IMX410 is definitely running the SLVS-EC protocol. The EC stands for embedded clock. Similar to PCIe Gen 1 and 2, data is packed and coded using 10b8b protocol before going out to the PHY layer. This makes the signal DC balanced and guarantee sufficient transition for clock recovery.

Unfortunately, SLVS-EC is a proprietary protocol nor as popular as MIPI D-PHY. Detailed information regarding its packet format and encoding is scarce. There aren’t many SoCs with open datasheet out there supporting SLVS-EC sensors. On the FPGA side, there were several IPs supporting SLVS-EC protocol. But all of them are using the gigabit transceiver and require hefty licensing fees. My microZed Zynq 7010 only has HR IO. And my Zynq Ultrascale+ ZU4EV only supports up to four lanes on its GTH. None of these appear optimal solution to me.

However, since Vivado 2019.1 Update 1, Xilinx finally certify its Ultrascale+ HPIO for data rates up to 2.5Gbits in D-PHY. D-PHY in high speed mode is in fact running SLVS IO. This opens the possibility to interface up to three 8-lane sensors per HPIO bank. And let’s save the GTH transceiver for real full duplex multigigabit applications like PCIe Gen3/4 or 40Gbit Ethernet.

Clocking

To start off, I decided to work on my ver.2 PCB at reduced frequency. If frame rate is not a concern and I am mostly running on the slow 14bit ADC, data transfer won’t be a show stopper here. Additionally, SLVS-EC also has a 1152Mbps line rate other than the default 2304Mbps if I managed to find the PLL registers.

Another trick is to make the FPGA synchronized to the IMX410 sensor so I can avoid clock data recovery altogether. This is similar to PCIe where both root complex and endpoint are running on the same external 100MHz reference. On the Nikon IMX410/309 module, the 72MHz oscillator output has a T-branch. One end goes into the CIS and another to the connector. This connector pin was left floating on both Z6 and D850 flex cable. So I feed this clock pin into a FPGA MRCC pin. If I do not enable the oscillator during the power on sequence, its output will tristate. This make it possible to drive a clock directly from a FPGA at reduced frequency.

Normally the sensor is running 72Mhz x32 PLL multiplier giving a 2304Mbps output. I reduced it to 40Mhz initially making 1280Mbps well within the reach of HR IO on the 7 series FPGA. The MMCM will generate a 320Mhz BUFIO clock and a 160Mhz fabric sampling clock.

Line period is 6.53us on Z6 during liveview and 12bit still shooting mode. In 14bit mode, the line rate is reduced to 12.65us. Both these timing periods are further extended by 80% to account for reduced master clock.

Using the Version 2 IMX309 carrier card for IMX410BQT

Protocol Analysis

For decoding, I configured both ISERDESE2 receivers on each IO differential pair to 4bit DDR mode. The positive and negative end of ISERDES will first scan the transition edges by one IDELAYE2 tap difference. Once the center of the eye is locked, the negative end will shift to the next eye location, making both ISERDESE2 working in QDR mode under a 320Mhz BUFIO clock. Combining odd and even bits will generate 8-bit deserialized data and directly feed my DMA in a logic analyzer mode. In 600ms it will capture 768MiB on the PHY layer.

When I’m not generating XHS/XVS pulses, the sensor will output a constant repeating pattern. 10’b0110001011. Checking the 10b8b encoding table, this corresponds to D00.0. This is the IDLE code when SLVS-EC transmitter is active. When XHS/XVS are regularly pulsed, IDLE code will fill in the blanking space between data packets. This means I could use D00.0 as word alignment key giving its repeating nature. After some analysis, I found the packet boundary closely matches the Table 2 from this Microsemi document.

Start Code K.28.5 – K.27.7 – K.28.2 – K.27.7

End Code K.28.5 – K.29.7 – K.30.7 – K.29.7

Pad Code K.23.7 – K.28.4 – K.28.6 – K.28.3

Deskew Code K.28.5 – 0x60 – 0x60 – 0x60

On the PHY layer, each packet begins with a Start Code and stops with a End Code, then it’s immediately followed by a Deskew Code. Between the start and end codes is packet data with some intermittent Pad Code. These Pad Codes are randomly inserted and should be ignored on the PHY layer decoder. The location of these Pad Codes are probably due to frequency mismatch between XHS pulses and internal SLVS-EC transmitter.

IDLE .. IDLE – Start – Packet Data – End – Deskew – IDLE .. IDLE

After 10b8b decoding and getting clean 8-bit data, packet data are byte-wise distributed over eight data lanes. After assembly, each packet begin with a 24-byte header. On IMX410 this header is 8 bytes of worthy information duplicated three times for redundancy. Within this 8-byte block are defined as follows:

Bits

1

1

1

13

1

31

16

Field

SOF

EOF

Line valid

Index (1..8191)

Embedded

Reserved (Fixed to 0)

ECC

The index value begins at 1 and are reset at the first XHS after every XVS pulse. The first few rows before the effective pixels are embedded data. From my observation of IMX410, the reserved fields are set with zero. The ECC field is a 16-bit XOR between the selected bits in the 48 MSB of 8-byte header. I have decoded the following bits except zero fixed reserve fields.

Bit    Field    ECC bits affected
31    EBD     0b1000000000111111
32    IDX0     0b1000000001111011
33    IDX1     0b1000000011110011
34    IDX2     0b1000000111100011
35    IDX3     0b1000001111000011
36    IDX4     0b1000011110000011
37    IDX5     0b1000111100000011
38    IDX6     0b1001111000000011
39    IDX7     0b1011110000000011
40    IDX8     0b1111100000000011
41    IDX9     0b0111000000000011
42    IDX10     0b1110000000000110
43    IDX11     0b0100000000001001
44    IDX12     0b1000000000010010
45    Valid     0b1000000000100001
46    EOF     0b1000000001000111
47    SOF     0b1000000010001011

A bit-1 in ECC bits indicates that particular ECC bit includes XOR of the input bit

Following the 24-byte packet header is the payload data organized in 224 + 4 bytes chunks. The 224 bytes carries packed pixel data and a four-byte ensuing parity checksum for this 224-byte chunk. The last chunk might not reach 224 bytes long but also has a four-byte checksum. All the data chunk should be concatenated to form packed pixel stream for each row.

The packing method is identical to MIPI D-PHY. Each packed block should be 8-bit aligned. 12-bit RAW are packed 2-pixel 3-byte and 14-bit RAW are packed 4-pixel 7-byte. The MSB 8-bit are filled first for these pixels followed by MSB justified concatenation of LSBs.

p0 = data[:,:,0] << 6 | ((data[:,:,4] >> 2) & 0x3F)

p1 = data[:,:,1] << 6 | (((data[:,:,4] << 4) | (data[:,:,5] >> 4)) & 0x3F)

p2 = data[:,:,2] << 6 | (((data[:,:,5] << 2) | (data[:,:,6] >> 6)) & 0x3F)

p3 = data[:,:,3] << 6 | (data[:,:,6] & 0x3F)

Numpy soft decoding of 14-bit RAW data

Decoded Image

Now I finally have valid data to demonstrate successful image capture. The image size is 6104 x 4234 in still capture mode. Read noise is low and no visible row ripple from the power supply on my carrier card.

My next goal is to improve signal integrity and increase readout clock frequency.

Differences to IMX410AQL version in Sony A7III

Sony also employed IMX410AQL in their latest A7iii mirrorless camera. In the beginning I tried using the SPI configuration sniffed from a Sony A7iii for the BQT version. The SLVS transmitter turned on and the sensor responded to XHS/XVS pulses by generating training sequences with specific register writes. But the BQT version never outputs a valid image except empty data. Clearly the BQT version is customized and require some other private configuration.

With the SPI configuration from an actual Nikon Z6 camera this BQT finally works. I did compared the final register setting between Sony and Nikon. A lot of them differ. But all the functional setting registers (mode, analog/digital gain, shutter, ROI, etc) share the same address. This CIS sensor might have an one-time-programmable area burnt with other driving settings customized by Nikon at the fab.

Both Nikon Z6 and Sony A7iii also have on-chip phase detection pixels. Sensitivity compensation for those pixels are done in ISP. Thus with direct readout, I should be able to see those pixel when lens cap is off.

Repeated horizontal lines from the blue Bayer channel (12bit silent still)

Repeated bright row very 12 rows (6 here when a single Bayer channel extracted)

Usually phase detection pixels have 50% of their opening masked. And they should be dimmer, not brighter. To find out why, I asked the ChipMod lab to investigate under a microscope.

It turns out the blue pixels are replaced with green dye at phase detection pixels. And for most light sources green channel has more photons. It makes sense for focusing elements to get more light.

A Bayer removed IMX410BQT under 40x magnification

I wonder how BSI focusing pixels had their masks implanted. Unlike the FSI pixels you can have the metal one layer half opened.

Phase detection pixels on IMX410AQL shows a more irregular pattern

Nikon’s phase detection pixels stride every 2 columns and 12 rows regularly. Sony has a more irregular pattern for every 4 columns and 6 or 12 rows in between. There’s another version on Sony’s website – IMX410CQK, which probably doesn’t have any phase detection pixels.

Signal Integrity

To be continued…

IMX235, IMX071AQE and Foveon F20A

In early May this year I have extended the work on IMX309 to other Sony sensors. These sensors share a similar serial data protocol. IMX235 is in the heart of Sony A7S and A7Sii. It has a large pixel making the fill factor better compared to smaller pixel design at the same technological process. In fact, the 18% SNR curve puts this sensor comparable to the Nikon D850’s back-illuminated CMOS.

Bayer stripped IMX235AQR from Sony A7S, a new cover glass installed by ChipMod

There were two packaging versions, AQR and AQL. Both share the same LGA pinout. The latter has integrated mounting frame to minimize dimension required in a 5-axis image stabilization system. We compared the SPI register setting and both are the same. The interface is a 12-lane sub-LVDS running at 432Mbps DDR. This sensor uses standard ITU sync code with SOL/EOL 4 word sequence (FFF 000 000 XXX). This sensor has similar dual row readout like KAC-12040. The odd and even rows are distributed between two 6-pair data lanes.

Even rows – 0 7 2 9 4 11

Odd rows – 6 1 8 3 10 5

subLVDS lane distribution in 6 x 2 pixel blocks

There is lane multiplexing going on at different operating mode. Live preview is only using 4 lanes. 4K video and still capture is using all 12 lanes. As for bit depth, still and magnified mode is using 14bit where line rate is limited by ADC readout. 4K video and silent still are running on 12bit all-pixel-scan. In the electronic first curtain, row reset sweep is synchronized using a external signal on the sensor XPI pin.

XPI sweep after the first XVS pulse at the same rate mechanical curtain travels

The charge reset alone is much faster than ADC readout. On IMX309 the XPI pin is not used. Instead EFCS sweep is done internally using a series of register settings defining vertical segments and progression speed.

PCB layout with main power supplies

Flex cable connection between the sensor module

Integration with A7S chassis and microZed module

Mounting a Nikkor 50mm lens with adapter

With the rest of control logic and pixel reorganization in place, I can stream video quickly with exposure control.

14bit rolling shutter with vertical readout cropping (click to expand)

IMX071AQE

Pentax K-5 and K-5II employed this sensor in a ceramic package. The silicon die looks identical. The MSB lane (7th bit) is not used and the data stream is sent in parallel mode with sub-LVDS lanes 0~6. The difference in register setting between this and IMX071AQQ is minimal. I actually tried the AQE register setting on the AQQ variant years ago. But the output remained the same serial format. It appears the lane setting might be burnt into some OTP memory at factory.

Same PCB with the other connectors for Pentax K-5

I’m not going into the detail as this one is identical to AQQ in D5100/D7000. Even the width of each row read is the same – 5040 pixels wide. The synchronization sequence consists of four pixel words and some non-standard code. The first line has the last word as 000E/000A. This indicates a start of frame sequence.

SOL: 226E – 3715 – 0A84 – 000C

EOL: 026E – 3715 – 0A84 – 0008

Even though this version employs parallel data transmission, the pixel output range is still capped at 0x3FFE just like the serial AQQ version.

Foveon F20A

Another sensor of interest is the unique Foveon X3 design. Photons of different energy (wavelength) are absorbed at depth with different probability. Foveon employed a special silicon process to manufacture three layers of photodiode in stack fashion. Thus color information can be deconvoluted from the varying intensity between the three channels. This method unleash huge improvement in spatial resolution. If I can make charge binning before readout possible (might be possible with some pixel driving hack but very difficult), it possible to make this a color and monochromatic camera at will!

Before the Merrill generation, the original Foveon sensor 4.7MP F13 (Part number FX17-78-F13D-07) is available for sell and FAE support. But after some research I found it lacks integrated ADCs. And the temporal noise is high lacking true CDS readout. Also each layer has dedicated amplifier making the charge binning across depth physically impossible. This also makes pixel design very complicated. There are 13 transistors in each pixel. Quantum efficiency is relatively is high at peak but falls of rapidly as wavelength goes into red or blue.

Chipworks teardown

Foveon F20A teardown from Chipworks

The Merrill generation is a step forward. All three photodiodes shares the same floating diffusion with dedicated charge transfer gate. 6T pixel design increases fill factor and increase pixel density. F20A also has integrated ADC.

But since 2008 Foveon is part of Sigma and I do not think they will sell this sensor to third party anymore. So I took out my hacking skill once more. I picked a second hand DP1m camera as it is cheaper compared to the SD1 Merrill. They all used the same sensor but DP1 also has liveview and zoom functionalities. A big plus for reverse engineering.

 

This camera has good layout. A single flex cable supplies all the power current and control signals to the sensor. A five-lane differential signal transmits the pixel data. The sensor connector is 29 pins at 0.5mm pitch. The main PCB is a rigid-flex design to save some space.

Sigma TRUE II ISP/SoC with 256MB DDR2 in 32bit single memory channel

The firmware should be contained in the NOR flash memory. The back side of this PCB are populated with huge passive components like power inductors from the two integrated DCDC regulator ICs. DP1m will take four calibration frames after the first shutter release. The SDRAM is pretty limited on this camera.

Removing the sensor flex cable revealed existing test points on the sensor PCB. This is exactly what I wanted! After some probing, I found the control interface is a simple two wire I2C. There are three power rails 2.5, 3.3 and 4V. The main ISP also drives a 40MHz clock to the sensor.

Judging from the differential common mode and peak swing, this is again sub-LVDS. The other signals are enables to power regulators and a sensor sync output, all referenced to 3.3V.

X3 Logo and Foveon part number F20A

In the next post we’re going in a deep dive on the data fetching, row sync and image characteristics on Foveon F20A.

Full speed ahead – My new generic VDMA

In 2016 when I build my KAC-12040 camera, I wasn’t satisfied with the Xilinx VDMA IP. It closes timing only at 150MHz. It neither supports arbitrary size for a compressed stream. So I wrote my own DMA engine to exploit the full bandwidth of AXI-HP port on 7-series devices. I had managed to close timing at 200MHz at 64bit. Back then when my carrier card only supported 4 LVDS banks from that sensor, this bandwidth was more than enough for 1280×720 RAW stream at 600 FPS.

But to achieve this I had to overclock the LVDS transmitters. This led to stability issues on my engineering grade sensor at high frequencies. To circumvent I implemented sync error detector and dynamically drop bad frames. This solution is perfect at moderate overclocking. But as frequency approaches 200% the drops are so severe that the gain is meaningless. As a result, I couldn’t push further for higher horizontal resolution.

In 2017 I ruled the KAC out from my astro-imaging candidate due to image quality concerns and available of better alternative. At this point on I decided to unleash its full potential for a pure high speed camera. This would require all 8 banks of LVDS signal into 7010 FPGA. It was done by routing only a single LVDS clock into MRCC pin and dynamically figures out the phase relationship for each data banks. Additionally I can discard MSB pins on later LVDS banks since 12/14bit ADC readout is slow enough using only the first two or four banks. This strategy freed enough IOs for bank 4~7 during high speed operation.

Layout without length matching, all eye diagram and phase are decided at run time

Back

Back

Soldered PCB with a socket

Improvements

This bandwidth requires higher AXI clock frequency. The limit set by 7-series AXI port is 250MHz. It’s time that I completely rewrite my VDMA. The timing closure poses a significant challenge now, especially when we are dealing with Artix-7 C-1 fabric. But some careful analysis on my previous design exposed the critical paths to improve:

1. The routing length into the hardened ARM processor is very long and 64-bit bus can easily cause routing congestions. The solution, avoid additional combinatory logic on the data path. Have a register with its Q directly go into the ARM processor. Or another way, prepare the write data in a FIFO BRAM with its output directly connected and its read enable as the release control. I chose the second option as it’s more elegant. By setting internal DO_REG = 1 the FIFO will have one more cycle of latency but significantly improved timing.

TData/TLast directly go into ARM processor without interconnect

2. The AXI interconnect is not that good. The additional logic converting burst into 16 wastes logic. Thus I parameterized the C_BURST_SIZE and AWLEN bits correspondingly. When I can set this to 16, the port now conforms to AXI3 on Zynq 7 ARM processor and the entire AXI interconnect optimized away at IP integration.

3. Scan TLast and count burst as stream comes in and issue address write accordingly. Pipeline the logic as needed and insert double buffer (skid buffer) when necessary.

The result is quite satisfactory. 328 Flip-flops and 276 look up tables. Timing closure is effortless now.

In the meantime, I rewrote my bit concatenator block which converts arbitrary bit length into 64bit. Length of each burst is specified in TUser. This IP would cope with compressed bit stream or dynamically changed bit depth during sensor operation.

Application

Now I can replace this block in every one of my cameras. With some modification on the LVDS receiver in this KAC-12040, I can now stream 16Gbits per second. At extended row width, 3600 x 720 can stably stream at 600FPS!

This speed could also extend to some SLVS-EC sensors. Once decoded to 8bit, 8 channels yields an insane 1.85GB/s data rate.

Update 12/1 application use

Multiple channels of my VDMA serving six simultaneous MIPI streams

Successful ported my VDMA to Xilinx Ultrascale+ architecture. This application is for a 6 MIPI stream, 360 degree surround view on autonomous vehicle. Timing closure at 300MHz AXI4 clock.

Leave a message below if you are interested in this IP and its pricing!

No datasheet, No FAE, No problem! – The proper way to hack Nikon D850

Two years ago we identified the sensor inside Nikon D850 with ChipMod lab. There’s plenty justification for this sensor in astronomy. It’s the first back-illuminated full-frame CMOS mass produced. It is very fast. It supports various movie resolutions up to 4K30P and 720P at 120FPS. It also has electronic first curtain and fast enough scan rate to enable a fully silent rolling image capture mode. The chip packaging is also very compact with a single connector. And a metal frame directly attached to the sensor itself enable quick thermal dissipation. Nothing is more perfect than these.

This article will be technical in many aspects and serves as a documentary of how we approach such problem in R/D process. Let’s roll!

Initial probing and speculation

The first step we do is to separate the power rails from the signal buses. Usually these power traces are thick and connects to multiple pins to reduce resistance and impedance. This allows high current to flow through. Also many have immediate capacitors to reduce ripples. At the same time we could find the ground pins using a multimeter, as well as the control pins connected to each power regulators.

Four thick traces ending with big electrolytic capacitors are power rails

Next we can search this connector part number with some basic measurements such as pin pitch, count and connector type. This one is clearly a mezzanine connector with a middle slot. On DigiKey or Mouser we can quickly filter from their vast inventory down to just tens of few components. Then it’s really just reading datasheets and to see which one matches.

The flex cable tells more information. There’re 17 differential lanes. It means there’s one functions as a clock among 16 others for data. This extra clock lane tells no embedded clock like PCI-E or USB3 is needed and the speed should be low enough for most cost-sensitive FPGAs.

The other traces are power enable and sensor controls. Judging from typical large Sony sensors, this one is probably again SPI running in HD VD slave driving mode.

Using a sniffer board

This time Nikon designed the connector well, much too well that I can flip the flex cable and put the sensor above the main PCB. I decided to build a stacking female-male board for data logging.

This board is just a passive pass-through for most signal traces. The control buses can be tapped.

Flipping the flex cable around and exposing the sensor for easy connection

Logic analysis

As expected, this sensor shares the common SPI protocol just like IMX071 and IMX094, except with increased functionality requires 16-bit address space for more registers. In the still capture mode, line period is around 11us. This gives a whopping 15FPS in 14-bit mode, truly amazing! From here I can make some rough deduction on the clock frequency and data rate.

Because each data line contains 8256 pixels at least, this distributes 7224 bits over each lane. The minimal required frequency is 657Mbps. The supplying clock frequency is 72MHz to the sensor internal PLL. The closest multiple should be 720MHz internal and 360MHz DDR clock.

720Mbps should be OK for most FPGA I/O banks. Great!

For detailed SPI protocol, I wrote some bash and python scripts to automatically compare the setting between different modes. ISO, shutter, digital gain and region of interest are clearly defined. For electronic first curtain, IMX309AQJ appears to use internal register setting to drive the charge reset scan. This contrasts with Sony A-series with an external pulse.

Power sequencing

Another important aspect is the power sequencing. Correct supply voltage and power on timing is essential. Some mixed signal ICs even requires strict power sequencing just to prevent frying the circuit. Logic analyzer is capable of capture such sequence but capacitor charging and discharging will delay the action edge a lot. Thus in most cases a scope is desired.

Digital signal rises quickly but power rails slowly charges decoupling capacitors

There are some rails can sleep during long exposure while digital circuits are inactive. It’s beneficial to log their behavior and relative timing to other control signals as well. In all there are six voltage rails supplying this sensor module, among which three are low voltage at 1.2V for digital logic and high speed interfaces.

Layout of carrier card

With information almost complete, I can layout a carrier card to include necessary power regulators and bridge the high speed signals into FPGA. Only a single I/O bank is needed.

Carrier card in black solder mask

The carrier card is designed with the IMX309AQJ sensor mechanically centered relative to microZed SoM module. Due to board to board stack height and room constraints, most regulators are placed on the back side. I wrote a simple power sequencing logic mimicking the D850’s and verified all voltages are correct. With sensor attached and first power on, I breathe a sigh of relief. Nothing went in smoke, great!

Driving the sensor and verify the clock frequency

For fast bring up, I could duplicate a configuration register setting from liveview where sensor is free running. This could drive the high speed clock continuously for frequency measurement. There’re ways to do this without a high performing scope with only digital counting logic.

reg [15:0] freq_counter = 0;
reg toggle_sync = 0, toggle_sync1, toggle_sync2;
reg [15:0] freq_counter_prev;
always @(posedge clk_div) begin
    toggle_sync1 <= toggle_sync;
    toggle_sync2 <= toggle_sync1;

    freq_counter <= freq_counter + 1;
    if (toggle_sync2 != toggle_sync1) begin
        freq_counter <= ‘h0;
        freq_counter_prev <= freq_counter;
    end
end

reg [9:0] standard_counter = 0;
reg [15:0] freq_counter_sync;
always @(posedge s00_axi_aclk) begin
    standard_counter <= standard_counter + 1;
    freq_counter_sync <= freq_counter_prev;
    if (standard_counter == 999) begin
        standard_counter <= 0;
        toggle_sync <= ~toggle_sync;
    end
end

Frequency measurement logic

The idea is to latch the tick count at certain absolute timing intervals. The s00_axi_aclk is a standard 100MHz clock and after 1000 ticks (10us) we signal across the clock domain to latch the tick counter in target clock and also reset it. Since direct measurement can pose timing constrain issues, we need BUFR to divide it down to clk_div.

The frequency matches my guess to be 360MHz. This is a DDR clock sampling both at rising and falling edges.

Sensor stack on carrier card

Getting data with ISERDES and DMA

I do not know the relative phase relationship between this clock and data lanes. So I migrated my Dynamic Phase Alignment algorithm from KAC camera to this one. This enables the IOB to scan the transition edges for eye diagram and sample at the best possible location. In the figure below, “x” indicates where rising and falling edges happens and “-“ means eye opening region. It’s apparent that the skew is minimal between data lanes even with half inch of peak difference in trace length. I can simply set a single tap delay value for all of them in the center of data eye.

There are 16 lanes. I used a 1:4 deserialization ISERDESE2 to construct a 64-bit AXI stream. This data is then continuously fed into a DMA engine. The idea here is to make this first a high speed logic analyzer. The dumped binary file should contain all information regarding frame and row synchronization, video blanking and effective pixel data in serial format.

Sync sequence and data format

Most large format Sony sensor uses a bespoke synchronization sequence other than SAV/EAV defined by ITU. From xxd dumped of DMA binary file, I found this sensor to be no different.

First line – FFF 000 FFF 000 FFF

Other lines – FFF 000 FFF 000 000

There are no end of line sequences. Thus counters must be used in conjunction with SOL sequence detector state machine.

Viewing properly formatted image stream

With these information ready, I can now implement line and row counters for a proper VDMA. The same DMA IP I built for KAC-12040 is used to provide 1.6GB/s data rate. Properly formatted images can then be transferred into memory.

As usual, with everything in shape, I implemented the freeRTOS system to handle Ethernet command and control. Video streams are based on UDP packets. On typical POSIX system like macOS and Linux, there will be no packet drops with a direct 1G connection.

Right now the lens mount is Canon EOS. Unfortunately I do not have a adapter nor EF lens with me. Will update real images later.

Line skipping and video formats

Before thinking of functionalities, I first need to figure out operation modes. This can be done with register setting comparisons. Some simple python scripting enables such comparison.

All still modes are in 14-bit ADC readout including complete silent mode. To speed up in video, it has to sacrifice ADC resolution and readout lines. In total, I found 4 different driving modes. All video modes are 12-bit readout.

Liveview base/1920×1080/1280×720 FX/DX 60 – 24 FPS

1280×720 FX/DX 120/100FPS

4K FX 30 – 24 FPS

4K DX / Liveview Zoom mode

The first two modes are full sensor area readout with horizontal binning and vertical line skipping (subsampling). The DX 4K and magnification is 1:1 windowed readout and I can see no color aliasing. The FX 4K mode scans a larger area. This mode appears to use vertical binning instead of skipping for better quality.

Additional functionalities

With the above analysis, I can isolate the registers responsible for ISO analog gain, digital gain, exposure time and window cropping. Some of these function can be applied to other modes where they are not enabled. I combined 14-bit readout with window cropping so in some SNR critical scenarios data in a small region can be realized.

window_mode

Partial readout with lots of vertical blanking

A lot more awaits discovery!

 

Update 9/16

I played around with various movie modes and here’s some updates on the imaging sizes.

In 4K high resolution DX movie mode is running on 5520 x 3070 with 7.5us per row readout. In the FX mode this is 8352 x 2328 on 9.34us per row. There are additional 22 rows each ahead for bias calibration. This 4K is the end result of downsizing from these imaging areas. The DX mode readout area is roughly 16:9 in the APS-C crop region of this sensor. This sensor can perform 14bit ADC in 11us. At 12bit readout, most single slope ADC runs at one quarter speed. The line rate is limited not by ADC but by how fast it sends data. Limiting horizontal region is a perfect solution here. In FX readout mode the aspect ratio is doubled, closed to 32:9. If we read all 4656 lines the frame rate can only achieve 23FPS. What Sony probably did was they run ADC twice on alternative lines and vector added the value before sending off. Then the ISP downsizes on X-axis.

In the 1080P mode the resolution happens to be 1/3 in each direction, 2784 x 1854. This mode is similar to IMX071 we seen before, the sensor binned the horizontal pixels internally and then skip two rows for each row read.

Update 12/3 – Version 2 PCB

On version two I swapped the 1.2V LDO with a buck step-down converter to alleviate thermal issue on such a compact PCB. These LDO requires a 2V minimal input supply. The 1.2V output rails are for the sensor digital logic, SLVS transmitter and PLL circuits. All these draw a lot of current making LDO very inefficient. Plus digital rails are not that sensitive to power ripples compared to analog counterparts. All three now share the same source regulator. Two of these are gated by a load switch to reduce inrush current during power sequencing.

In addition, I relayed the 72MHz crystal clock cross the logic level shifter into the FPGA MRCC pin. This is for the future IMX410BQT sensor. We have this sensor in our hand but we haven’t figured out which camera it came from, presumably the new D780 DSLR. Since IMX410 in Sony A7III uses SLVS-EC, we need a reference clock for the PLL to run from. Other IOs are remapped accordingly to make room in the new layout. The reserved pads for termination resistors are removed. FPGA internal LVDS_25 DIFF_TERM works just fine for SLVS 200mV common mode.

Both sensor looks identical with the same packaging design

Dark sky backyard of Michigan

In 2016 I got my AZ-EQ6. Without a proper imaging set, I did mostly visual observation with my 150 Rumak. In 2017 when the 70SA Gen II was released, I immediately acquired one. This is a compact quadruplet astrograph offering a very wide field of view at F5. Usually scopes this small only corrects for cropped sensor but this one does 135 full frame image circle.

Two SkyRover 70SA with different front AR coating

So in 2017 I restarted my astrophotography journey in June. Because Ann Arbor is so close to Detroit Metropolitan area, light pollution is a problem. Even if I drive 20 minutes to the west to the university’s radio telescope facility, light dome from the Jackson on southwest is still an issue. The best place without going to the north is a small state park called Lake Hudson close to the Ohio State borderline.

Lake Hudson is 50 miles away in one hour drive

My first stop in the sky was the Trifid and Lagoon nebula. This trial exposes some unforeseen issues I hadn’t thought of. For one, the focuser sag gradually drift my camera out of focus as the telescope tracked across the sky. I had initially focused the camera during mount alignment. After target repointing I left the focus unchecked. Composition was another lesson. In pitch darkness, it was very difficult to tell the sky outline from the optical viewfinder. Getting familiar with the position of reference star beforehand would save vast amount of time for actual imaging.

Lagoon and Trifid

Lagoon and Trifid Nebula

During the solar eclipse road trip, I’m getting more familiar with the setup. And yes of course, that deserved a whole new article. But I’ve been too busy and still not satisfied with the data processing. Our team had gotten a treasure trove of data. On the return journey, I took the NGC7000 North America Nebula in the Badlands National Park. It was a perfect spot, absolutely no light surrounding us. It’s probably one of the best dark sites in contiguous United States.

4 hours of exposure at the darkest side yields perfect details

NGC7000 / North America Nebula

One great thing about AZ-EQ6 is you can do two scope side by side

Before the winter storm crawls in, Michigan actually gets a few clear night every new moon phase. The problem to have these over the weekend. Thus I would still pursuit the clear night when I didn’t have a meeting next day. In September I returned for the Cygnus heart and then the Heart nebula after midnight.

The exposure was too short when the clouds rolled in. This is definitely worth a retry and I should also include the soul nebula. For the Cygnus heart region, I did a three pane mosaic and the data still requires processing.

Cygnus

Moving to October, IC1396 Elephant’s Trunk was overhead. At first I confused this target with the Rosette Nebula. It is a much larger region with star nursery in the center. My APS-C sensor could barely cover the entire region. Due to the light dome on the east and the emission nature of this nebula, I kept the CLS filter in. A three-hour exposure of this region revealed lots of detail, let the image speak for itself.

IC1396

IC1396 Elephant’s Trunk in 3 hours

In December I took the Triangulum Galaxy. The frigid weather made the battery life extremely short. But at the same time there was almost no dark current on the sensor. An hour long exposure left a clean background.

Triangulum Galaxy

In 2018 I also attempted the awesome eye of NGC7293. Now this object is highest in August but still due to the latitude of Michigan it’s close to the horizon and can be affected by light dome.

NGC7293

Another similar planetary nebula is the M57. But this one is so small I have to use the 150/1200 Rumak for help.

M57

At F12 the Ring nebula is still bright. However the hydrogen outer shell requires magnitudes of more exposure.

Before I left, the last target was the M16 Eagle Nebula. For this wide region, Milky Way mixes with the HII emission nebula. Use of CLS filter made color tuning a lot difficult. The entire galactic plane is broadband spectrum.

M16_Eagle

M16 Eagle Nebula

Now I need to find another spot amongst the light pollution in bay area. The sky is clearer but the city light grows significantly. A year has passed and I haven’t entirely settled down. So I took out my AstroTrac TT320X again. Far away from city light, the Mojave Desert is truly dark!

Orion_constellation

Check out my 4+ hours long Barnard’s Loop on my AstroBin

Cheaper yet powerful camera solutions

It’s been a while since my last blog post. During this past year, I’ve built a few other cameras yet released on this blog. In the meantime, I have been looking into options to make this work available to fellow amateur astronomers as a viable product. One major blocker here is the cost. FPGAs are expensive devices due to two factors: 1. They are less produced compared to ASIC and still uses state of art silicon process. 2. Massive area dedicated to routing and configuration logic. Let’s look at a simple comparison. The MicroZed board I’m using cost $200 with dual Cortex-A9 core clocking at 666MHz. This contrasts with quad core Ras Pi 3B clocking at doubling frequency. And it only cost $30.

However, using these single board computer SoC devices are not free from challenges. Most scientific CMOS sensors do not output data using standard MIPI CSI2 interfaces and require a FPGA fabric to do the conversion. Beyond that, we also need to choose a SoC that has CSI2 interfaces supporting high enough total bandwidth. Then to take functionality into consideration, it’d be preferable to enable edge computing/storage and provide internet hosting in a single solution. In the end, we conclude the next generation should have the following connectivity.

1. 1000Base-T Ethernet and built-in WiFi support

2. USB3.0 in type-C connector

3. Fast storage with PCI-E NVME SSD

Besides these, the device should be open enough with Technical Reference Manual (TRM) and driver source code available for its various IP blocks. Ras Pi clearly drops out due to limited CSI2 bandwidth and absence of fast I/O. After length and careful comparison, I landed on Rockchip RK3399. It has dual CSI2 providing a total 1.5GB/s bandwidth and powerful hex A72/A53 cores running above 1.5GHz for any processing. One platform from friendlyArm NanoPC-T4 board is the most compact among all 3399 dev kits. This board also has IO interfaces aligning on one edge making case design straightforward. It is vastly cheaper compared to Zynq MPSoC with similar I/O connectivity.

NanoPC T4

Two MIPI CSI2 connector on the right

Now the rest is to provide a cheap FPGA bridge between the sensor and CSI2 interface. The difficult part is of course the 1.5Gbs of MIPI CSI2 transmitter. On datasheet, the 7 series HR bank OSERDES is rated at 1250Mbs. But like any other chip vendor, Xilinx down rate the I/O with some conserved margin. It’s been shown before that these I/O can be toggled safely at 1.5Gbs for 1080P60 HDMI operation. But still, that is TMDS33 with a much larger swing compared to LVDS/SLVS for MIPI D-PHY. To test this out, I put a compatible connector on the last carrier card design using extra I/Os. Because D-PHY is a mix I/O standard running on the same wire, only the latest Ultrascale Plus supports it natively. To combine both low power single ended LVCMOS12 and high-speed differential SLVS using cheap 7 Series FPGA, we must add an external resistor network according to Figure 10 in Xilinx XAPP894.

PCB resistor network with some rework

It is possible though, to merge all LP positive and negative line respectively to save some I/O if we are only using high-speed differential signaling. In this case, tying these LP will toggle all four lanes into HS mode simultaneously. The resistor divider ratio has also been changed because I need to share with LVDS25 signals from CMOS sensor in the same HR bank.

To produce an image, I wrote a test pattern generator to produce a simple ramp up value pixel by pixel in each line. Every next frame the starting value will increase by four. Timing closure was done at 190MHz for the AXI stream. This prevents FIFO underrun at 1.5Gbs at four lanes. I then took the stock OV13850 camera as mimicking target. A simple bare metal application runs on PS7. This app listens for I2C command interrupt, configures the MMCM clocking, sets image size and blanking and enables the core.

Finally, some non-trivial changes need to be done on the RK3399 side to receive correctly. After lengthy driver code review, I found two places requires change. First, the lane frequency setting in the driver. This eventually populates the a V4L2 struct that affect HS settling timing between LP and HS transition. Second, the device tree contains the entry for number of lanes used for this sensor.

MicroZed stack on top of NanoPC T4. Jumper cable are I2C

There’s a mode to disable all ISP function to get RAW data. This proves extremely helpful to verify data integrity. In the end, we won’t need ISP for astronomical imaging anyway.

Timing of low power toggle plus HS settle costs 21% overhead

Rolling ramp TPG wraps around through HDMI screen preview

This work paves the way for our ongoing full fledge adapter board. Stay tuned for more information soon!

Phase AF CCD Die Shot

Back in 2014 we were investigating the AF/Lens system at NikonHacker. To understand the operation of phase AF, some efforts were put into the AF sensor itself. There were leaked D1X schematics indicating 3 linear CCDs made by Sony (ILX105 and ILX107) are incorporated into the MultiCAM-1300. In the old days, a single chip could not handle that many segments of linear pixels on a single die, so that the light path had to be split and focused onto multiple chips. The same is done on MultiCAM-2000 which uses 3 chips as well.

Then from the D200 until D90, a single chip ILX148 is used to handle all 11 focus points in the new CAM-1000 AF system. Some teardown serves as great resources even showing a die photo of that sensor. Missing in between was the D70’s CAM-900. Later I came across a cheap working sensor stripped from a broken D70 and decided to take a look.

Front

Back

The entire module came in with dust, clearly from a broken camera fall onto the ground. I tore the 2 duct tap covering the slit between the chip and plastic optical assembly. The opening is a metal mask outlying the light transmission boundaries of 5 focus points.

Then I use a knife to peer off the glue on the sides, exposing the reddish epoxy adhesion between the chip carrier and the optical module. A gentle pull separated them apart.

The Sensor

Sensor Die

Now the AF CCD is exposed! You could see a total of 12 linear CCD segments forming 6 pairs.

Let’s look at the back side of the optical assembly to understand why.

Lenslet

It appears each focus point has a pair of microlenses. The center cross-type use 2 linear segments in perpendicular, thus 4 lenslets. That gives you total of 6 pairs.

To illustrate how this works, I cover the focus plane with a scratching paper and point its front toward a light bulb. And here’s the image.

Segment-Image

The pattern matches the layout of linear CCD.

Now we could mimic a high contrast target by covering 2 focus points in half with a sticker.

You could see the 2 lenslets forms a copy of 2 high-contrast edges in the 2 segments.

When this is relayed from a photographic lens, the distance between the 2 high-contrast edges will vary depending on the defocus value. The firmware A then uses some sort of cross-correlation algorithm to determine that distance. The distance is then compared against the calibrated value to get the actual defocus amount used to drive the lens AF motor.

So far that’s for the working principle for the phase AF optics. There’s a lot need to dig into the ASM codes of firmware A, and the electronic interface between the AF CCD and the MCU running the codes. Here I decided to desolder the CCD from the flex board. The CCD is packaged inside a CLCC and the contacts form a L-shape covering both side and bottom. It turned out the heat from soldering iron disassociates the wires from flexible board before melting the solder on the bottom. All the contact pads on the flexible board are destroyed.

The backside of the CLCC package has following marking.

20140906_205726

It’s a Sony ILX127AA linear CCD.  405 R9KK is the product batch code. “405” indicates it’s made in the 5th week of 2004, around the time of D70 and D70s.

The schematics can be obtained from wiring trace. In the diagram below, VREF is probably 3.3V based on the trace. SD0~3 and STB formed a simple parallel command interface. CLK is the master clock input. The analog output of pixel intensity is on Vout synchronized to SYNC.

image

Now we could dig into the image sensor die using a microscope. I took more than 50 shot and stitched using a panoramic software. The CCD was manufactured using a very old process node, probably larger than 1 micron.

ILX127AA

Click for Large View

The charge transfer is based on 2-phase CCD. The total number of pixels is around 996. Considering the metal masked pixels, this number reduces to 912. Thus MultiCam-900 make sense. The greenish regions are the actual photodiodes. The photon generated charge is then transferred to the shaded region on the left or to the top. The charge is then clocked and shifted out to the output amplifier. The three long segments are continuous with dummy pixels in between two correlated pixel regions. The six shorter ones form the left, center and right focus points are broken into two due to the long segments. Thus each shorter one has its own amplifier. The CCD integrates all the input command decoder/segment select/CCD driver logic on chip, as indicated by the vertical grid of synthesized transistors and their metal interconnect wires.

CMOS Camera – P7: Streaming Lossless RAW Compression

Now this post will be for some serious stuff involving video compression. Early this year I decided to make a lossless compression IP core for my camera in case one day I make it for video. And because it’s for video, the compression has to be stream operable and real time. That means, you cannot save it to DDR ram and do random lookup during compression. JPEG needs to at least buffer 8 rows as it does compression on 8×8 blocks. Other complex algorithm such as H264 requires even larger transient memory for inter frame look up. Most of these lossy compression cores consume a lot of logic resource which my cheap Zynq 7010 doesn’t have, or not up to the performance when fitting into a small device. Also I would prefer lossless than lossy video stream.

There’s an algorithm every RAW image format uses but rarely implemented in common image format. NEF, CR2, DNG, you name it. It’s the Lossless JPEG defined in 1993. The process is very simple: use the neighboring pixels’ intensity to predict the current pixel you’d like to encode. In another word, let’s record the difference instead of the full intensity. It’s so simple yet powerful (up to 50% size reduction) because most of our images are continuous tone or lack high spatial frequency details. This method is called Differential pulse-code modulation (DPCM). A Huffman code is then attached in front to record the number of digits.

The block design

Sounds easy huh? But once I decided to get it parallel and high speed, the implementation will be very hard. All the later bits have to be shifted correctly for a contiguous bit stream. Timing is especially of concern when the number of potential bits gets large when data is running in high parallel. So I smash the process into 8 pipeline stages in locked steps. 6 pixels are processed simultaneously at each clock cycle. At 12 bit, the worst possible bit length will be 144. That is 12 for Huffman and 12 for differential code each. The result needs to go into a 64bit bus by concatenating with the leftover bits from the previous clock cycle. A double buffer is inserted between the concatenator and compressor. FIFOs are employed up and downstream of the compression core to relieve pressure on the AXI data bus.

Now resource usage is high!

By optimizing some control RTL, the core is happily confined at 200MHz now. Thus theoretically, it could easily process at a insane rate of 1.2GPixel/sec, although this sensor would not stably do it with four LVDS banks. When properly modified, it could suit other sensor which does block based parallel read out. For resource usage, a lot of the LUTs cannot be placed in the same slice as Flip-Flops. Splitting the bit shifter into multiple pipeline stages would definitely reduced the LUT usage and improve timing. But generally the FFs will shoots up to match the number of LUT thus the overall slice usage will probably be identical.

During the test, I used the Zynq core to setup the Huffman look up table. The tree can be modified between the frames so the optimal compression will be realized based on the temporal scene using a statistics block I built in.

Now I just verified the bit stream to decompress correctly using DNG/dcraw/libraw converter. The only addition is a file header and bunch of zeros following 0xFF in compliance with JPEG stream format.

2017/10/31

The Whitney Challenge

I don’t prefer physical challenges. Yet the Mt. Whitney stood there a very special place. It’s the highest point in contiguous United States. Beyond that, it’s also one of few accessible through a trail. This makes it a huge difference comparing to, say the highest point in China – Everest, where professional mountaineering skill is essential plus huge bucks.

I’ve been tempting to summit this mountain for a while. So back in 2014 when planning a road trip in California, I decided to stop near the lone pine for a distant look on the mountain and get more detailed information. So here it is, a map from National Geography and a picture.

247530505892932681

HAO_4649

The center is actually the line pine peak, Whitney is right under the flag pole

It wasn’t until I got home that I found out the image was the wrong target. And the Whitney of course is so far way. This gives you a glimpse on the length of trail, a whopping 22 miles round trip! I gradually realized this wasn’t an easy task. Each year, some people got injured along the way or worse, lost their lives here. The complication is partly due to the altitude sickness after exertion. Other times, lack of buddy to overlook each other when accident happens.

In 2016, I met Weichen, a post-doc and an avid amateur mountaineer here in Michigan. This was the perfect chance to accept this challenge. Another big push to me was so much was going on with my life that I just want to forget. In May this year we didn’t get the lottery for an overnight permit. A slot in late July was the only option for two people.

So began our training, I picked a staircase with elevator next to it. And every other day, we set off a 99 floors of simulated hike with weight. Then the next day ran for cardio. The gradual push made me comfortable with even more weights. Three days before we left for Whitney, we peaked at 11 kilometer.

We packed in all gears needed for the hike. For your reference, head light and sufficient battery is critical as you don’t want to get lost in the middle of night or fall off cliff! Trekking poles can be helpful crossing the creek and on the way down. Lastly, don’t forget to bring food, candy and water. On the website it suggests 3 liters per person per day and that really is the minimal. There is water supply along the way if you bring your filtration device. But if not, bring plenty water!

The Great Sand Dunes

Monument Valley

Navajo Mountain and Colorado-San Juan River Junction

Lake Powell, Page, Glen Canyon Dam, Horseshoe Bend and Antelope Canyon

Grand Canyon

At that time, flight to Las Vegas was the most affordable. We arrived the previous day at noon. After getting the car and filling ourselves full with buffet, we headed toward California through the Death Valley National Park.

Badwater Basin

 

Stovepipe Wells and Sunset

By the time we got to lone pine, it was already dark. We retrieved the permit from a small locker next to the visitor center. Then we immediately checked into motel just to get enough sleep for the hike next day.

Day use permit at Whitney

Sky was clear in high Sierra early next day. We left the motel at 3:30AM before dawn. There was 20 minutes’ drive from lone pine to the Whiney Portal. From there we started the actual hike at 4:15. Just after the third switchback, we encountered the first trouble at the north fork of lone pine creek. Water level was unusually high that made us to trek it through with bare foot. The standing rocks were mostly submerged.

Moon and Venus, looking back down the valley

The trail stayed on one side of mountain for the next half hour of hike until you hit the Lone Pine Lake. The first crossing has tree log bridges. After a while, we hit another submerged section without any bridge. For once more, we took off our boots to cross it.

\

Me crossing the log bridge near Lone Pine lake

In the next hour, we saw the first sign of snow. The unusual heavy precipitation last winter at California has left so much snow to melt. And of course, the flooding alone the Whitney trail.

Daybreak above the Lone Pine Lake

Alone the route the trees recedes rapidly. After passing the outpost campground below the mirror lake, landscape transforms into barren rock faces. Occasionally, there’s patches of flower scatter between the melting snow.

The traverse is at the top of this huge snow patch

Then we encountered the first challenge. A part of trail completely covered in snow and we don’t have any ice climbing equipment as we didn’t expect this so late in the season. For each step forward, we made sure the trekking pole is deep solid in the snow and the other foot wasn’t slipping at all. I was scared since this was my first time attempting something so steep on ice.

The red arrow indicates the traverse covered in ice

At 10AM we finally arrived at the trail camp upstream of Consultation Lake. As we watched the huge massif and icy slope, chances of making to top at noon faded away in a flash. Altitude had already taken effect on my body when I saw my fingernails starts turning purple. We spent 15 minutes recovering ourselves with water and candy. Then we put on the sunscreen before march ahead to the endless 99-switchbacks that will quickly rise from 12,000 feet to trail crest at 13,600.

Looking back down at the Consultation Lake (right) and the Trail Camp Pond (left)

The last switchback, traversing a 40 degree ice slope

Had I succumbed to the previous scary icy traverse, then I probably wouldn’t have to suffer this near heart stopping endeavor at 13500 feet. After the last switchback, we had to traverse two more continuous patches of melting snow just before the trail crest. OMG! A 500 meter 40 degree slope is waiting for your slip at any single step. I tried not looking down. But at every step, I have to just to make sure I secured my foot at the right place.

Exhausted at the trail crest

This little guy wants my food!

At 4100 meters, I started to feel the pain in my stomach. I know I was exhausted, so was my buddy. But there was no appetite to eat. The weird feeling of thin air here is taking a toll! I essentially forced the lunch into my mouth. With the projected timing, we will at most arrive after 2PM. We dropped some of our backup supplies at the John Muir Trail Junction and we pushed on.

The Hitchcock Lakes in the Sequoia National Park

The final 2 miles to the summit was a completely different experience. Here traversing across the west flank of peaks leading to Whitney, it was dry and hot unlike the icy north slope of the valley we trekked up. For this portion, we were left with a single bottle of water for both of us. So we had to preserve as much as we can.

A final few steps toward the victory!

Oh I forgot, we brought two cans of beer! Cheers!

Finally, we made it at 2:20PM. And cheers! We brought two cans of beer, but only for photography. I sip once cause I knew alcohol would make you dehydrate even quickly. So we just left the other can inside the hut. We log our name into history and then for some serious view!

The highest 360 panorama in contiguous United State

(click for 360 link)

I took a 360 panorama on the top of Whitney, with love. There was absolutely no time to shoot a time lapse. We headed down immediately.

The needles next to the last 2 miles up

Started our way down at 3PM

At the trail crest, I felt we probably would not make it to the top. Now I had a sensation of dying due to dehydration before reaching the trail crest. Half way down, we consumed the last drop of water. It was a race against time until we dehydrate or make it back to the trail junction. There we have 1 more liter for both of us.

Some of the trail section is dangerously close the cliff edge

But to get more water, we would have to make it back down to 12000 feet at the trail camp, where most campers were. So I still have to think about that icy traverse again. We saw some daring climbers slid down the icy slope. That was really a dangerous move given melting snow with low friction and exposed rocks.

As the trail camp got closer, the hope of surviving finally overtook the fear of death. We met someone alone the route who kindly gave us filtered water. The fresh melting ice water was cold, yet so refreshing! The sun already falls behind the massif. We would have to made it faster to get back to the Whitney Portal before it’s dark again.

We skid down the frozen ice of lone pine creek

To conserve time, we follow the fresh trace of others alone the ice on the lone pine creek. We stayed away from the rocks where cracks most often formed. But in the end, there’s a moment we had to get back on the trail when getting close to the snow line. Sunset happened by the time we got close to the Mirror Lake. We got back onto the trail, turned on our head lamps and continued down the mountain.

We followed the river on the way down. GPS record

The flooding was even worse after a day of melting snow washing down in torrent. But this time, we trekked right across the creek. The cold ice actually made a beneficial effect like pain killer to our heating and swelling ankle.

The milky way came out, for the first time I’ve ever seen in high Sierra. It was so clear. With the calm air at high altitude, those stars barely twinkled! I decided to sit down for a five minute rest, just enjoy the sound of nature from waterfalls, insects, birds and the starry nights through the gaps between those tall pine trees.

We finally got back to the portal. Both cellphones battery went dry. Buddy’s GPS watch also stopped recording between the Mirror Lake and the Lone Pine Lake. It was fulfilling feeling after such an achievement. 22miles/36km round trip in snow icy condition without proper gear. It was close to midnight when we checked into hotel. I took a shower and fell asleep immediately when I touched bed. My leg still hurt the second day. That morning, I finished three plates of breakfast while watching sun light shine on the Mt. Whitney.

Sun set behind the Sierra mountain range at Owen Lake

We left the lone pine before sunset. I really need to treat myself with star gazing. Next stop was the Dante’s View over watching the Badwater Basin of Death Valley.

Along the high way, shimmering lights filled the distant void of desert. I wondered, would I one day stand on the top of that peak again or I might never wish to challenge myself with some like this. But one thing was certain, I had just left my lingering love on the top. Good night, Mt. Whitney!

The Milky Way Arch over Badwater Basin

(Click for 360)

Written on Oct 15, 2017