The Making of a Cooled CMOS Camera – P2

In the last post, I uncovered a bug in the Vivado implementation which accidently removes the DIFF_TERM from my input buffer. With that problem solved, I picked up the project again with a goal to achieve high speed imaging. Now I’m going to cover the design principal and its intermediate steps to achieve it.

VDMA

End Result – My customized VDMA IP highlighted

The end goal is to have an unified, resource-efficient and high-performance VDMA to receive, decode, and transfer data to an external memory. In this case, it will be the DDR3 on PS side. The screenshot above looks very satisfactory and concise. But in reality, any error in along its data path will complicates the entire debugging process. Thus I decided to tackle them one module at a time from the upstream, only combining them in the final run.

Step 1

The initial step is to make sure all receiving banks are decoding the SOL/EOL sync code properly. The stock ILA IP could be attached to the immediate downstream of the receiving module to verify their function. The ILA requires a free running clock. In this case, I had to let the sensor run in high power and continuous mode to feed a constant clock into FPGA.

Step 2

External FIFO

Intermediate Design: Receiver feeding multiple external AXI-S Data FIFOs

After verifying that all 4 banks are functioning, the next step is to receive the data with integrity. Due to inter-bank skew, each pixel from a different bank could arrive at a slightly different clock period. And the source synchronizing clocks are not aligned among banks. Asynchronous FIFO is the answer. Each bank will feed a FIFO at its own source clock. But all 4 banks are read out under a single internal clock when all FIFOs have available data. The AXI-Stream Infrastructure IPs contains a Data FIFO for AXI4-S protocol. Its TReady on the slave port will not go down before FIFO is full. To read off from all 4 FIFOs in an aligned manner, all data must be valid.

assign Ready = (Data_valid_0_in & Data_valid_1_in) & (Data_valid_2_in & Data_valid_3_in);

Step 3

The above method was tested under 2015.2. It will no longer work in the most recent version for some unknown reasons. This drives me to use the FIFO_DUAL_CLOCK_MACRO provided by Xilinx. It is essentially a BRAM with built in hardware asynchronous FIFO support on 7 Series devices. The read enable signal must be de-asserted in the same cycle when any of their Empty signal rises.

assign RDEN_all = !EMPTY_0 & !EMPTY_1 & !EMPTY_2 & !EMPTY_3;

Also, the FIFO must be reset for 5 period on both reading and writing clock before use. Thus I implemented a resetting handshake mechanism to enable reset as soon as the source clock starts running. This will give plenty time before actual data arrives. The downstream goes through an AXI-S Data FIFO buffer before the AXI-DMA.

 Internal FIFO

Final Step

So far so good, I’ve got a complete image. But many areas should be improved. First of, the pixels are not organized. Each transfer of 64 bits are 2 pixels from even row and 2 from odd rows. This column-wise de-interlacing simply costs too much for an ARM CPU to do. Secondly, at least a quarter of the bandwidth is wasted at 12bit ADC or lower. Under the 150MHz timing constrain of AXI-DMA. this severely hinders high frame rate operation. Thus I decided to use an interleave transfer mechanism coupled with bit packing. To use interleave transfer, address generation and switching is essential to jump between even and odd rows. As for bit packing, it’s a common trick used in many uncompressed RAW formats (DNG, Nikon NEF). In this instance, I will use 3 bytes 2 pixel packing at 10/12bit ADC.

The most complicated part is 4K boundary check required by AXI protocol. Consider the following code block:

If (Target_addr[11:3] + Burst_count > 4096)

m_axi_s2mm_awlen <= ((Target_addr | 12’hFF8) – Target_addr) >> 3;

else m_axi_s2mm_awlen <= Burst_count – 1;

This is doomed to fail at high frequency due to large number subtraction. The critical path length will be doubled given the bit width of the target address. The above step can be split into pipeline fashion. First checking if condition and set a flag FDRE. In the next cycle, use the flag to direct calculation. The pipeline latency is high in my implementation, totaling at 5 clock cycle assuming constant AWReady state. However, considering that the address can be generated in advanced while the last data burst is in transfer, the actual latency can be ignored.

Eventually, the design is successfully constrained under 200MHz AXI-HP clock. After hardware validation, the actual bandwidth for a 4K@60FPS video will be a whopping of 1.2GB/s given 1.6GB/s available.

Implementation

Implementation Map: AXI-mem-interconnect; AXI-GP-interconnect; Peripheral Resets

Receiver and Sync Detector; Interleave FIFO Control; S2MM Channel Core; AXI-GP Slave Control Registers

The synthesis and implementation run is only 3 minutes combined, given only 5 core Verilog files within my VDMA IP. The rest is just 2 AXI-interconnects and peripheral resets. The resource utilization is only 11% for LUT and less for FDREs in a 7010 device, leaving much space for future use.

Test pattern looks perfect! In my next post, I’ll showcase some actual images.

2016/12/5

Beware of DIFF_TERM

The Vivado tool chain might be wrong occasionally. And it screwed me up big time!

A bit of background where problem occurred

For serial data reception, this is usually straightforward due to each channel is independent from the others. A simple IDELAYE2 could sweep through all possible delay tap values and feed the data to an ISERDESE2 for deserialization. The final value should be set at the mid-point where deserialization is most reliable. But for parallel data, all bits in a word must be perfectly deskewed and aligned! Assume 8 bit data pairs has 16 taps each to scan through, then the combination would be at an astronomically value of 168 ! A workaround is used to call in a second IDELAYE2 to independently validate the result from the primary data path. This would make deskew independent.

How problem occurred

The implemented design works perfectly on hardware on each LVDS word channel. I combined them later on but did not connects the deskew feedback into downstream logic for partial validation. This is where Vivado gets “smart”. It optimized out all my unused logic including the secondary or slave IDELAYE2. But during the process, it “optimized” away the DIFF_TERM from the IBUFDS_DIFF_OUT! This severely impacted the signal integrity in some bit lines, which in turns, making my data reception completely fail.

The following Tcl command can be used to interrogate if all your IBUFDS has DIFF_TERM:

get_property DIFF_TERM [get_cells -hier -regexp {.*IBUFDS_inst}]

If all return is 1, then DIFF_TERM is set to True.

 

Interfacing Nikon CMOS module with ZYNQ SoC

In my last post I showed a CMOS camera in progress. This time I’m going to deviate from that topic a little bit by interfacing the image sensor module from D5100, the mighty IMX071.

On the same relay board that serves as the carrier card for the microZed, I intentionally included a flex connector for the Nikon module. The connector contains 8 data pairs and one accompanying clock signal, all as sub-LVDS signaling standard. The rest of pins are mostly power switches, SPI for configuration and synchronization signals.

image

The same connector on the Nikon D5100 motherboard

The great thing about FPGA is its versatile I/O standards. On the ZYNQ fabric side, each IO bank can hosts multiple I/O standards as long as their voltages are the same. Here I combined LVDS25 and LVCMOS25 for control into the IO bank 35. The LVDS25 is required to enable 100 Ohm differential ODT (On Die Termination). A simplified block design is shown below. A 2.5V to 1.8V logic shifter was omitted.

Block_design_nikon

The LVDS clock signal drives the entire logic fabric, which includes 2 major components. First, a signal receiver/decoder that writes data into a FIFO. The the AXI-DMA will transfer the data from FIFO into the DDR3 memory on the processor side through the AXI-HP port. Secondly, a sensor driver responsible for generating the correct line synchronization pulses for the sensor. The driver is configured through the AXI-GP port by the program running on PS.

All things connected, we have microZed stacked on top of my relay board, a D5100 CMOS module on top right and its DCDC power board on bottom right. Upon power up, the logic will load the bitstream configuration. Once this is done the program running on ZYNQ ARM processor will configure the system, interrupts and PS IO port. Then the sensor will light up followed by the register setting sequence. The final step is the actual synchronization driving and DMA run. After data acquisition is completed, the program writes the image from RAM to SD card.

Decoded_image

A decoded test image (lens not attached) between vertical blanking regions

We are currently designing the full product! I’ll keep posted!

The Making of a Cooled CMOS Camera – P1

As my last post had suggested, I was working on a camera design. Right now the “prototype”, as I would call it, is in the test phase. The project actually dates back to 3 years ago when we envisioned a large focal area CCD imager customized for deep sky astrophotography. At that time, the price for such a commercialized camera was so prohibitive. The most suitable monochromatic chip was the interline KAL-11002 with a size of 36 x 24mm^2. Unlike full frame CCD which necessitates a mechanical shutter for exposure control, interline could handles this electronically. However, the addition of a shielded VCCD region greatly impacts the quantum efficiency and full well capacity. Beyond that, Kodak CCDs don’t seem to recover QE well enough with microlenses, with peak at 50% and only 30% for 650nm on a B/W device. Later on we started to dig deep into the datasheet and soon we abandoned the project. The accumulated dark current in VCCD was simply too much at the slow readout speed required for decent level of read noise.

KAL-11k

The KAL-11002ABA in the original plan

What happened next was dramatic. After getting my hands on D7000 and the hacking, I was shocked by how good CMOS sensor performs. I soon realized the era for CCD in astronomy might come to an end. Sooner or later, it will too embrace the noiseless CMOS in the telescopes. When Kodak span off its imaging division to Truesense, it soon re leased its first CMOS sensor with sub 4e- read noise and CCD-like dark current. We decided to give it a try.

KAC

Got the sensor, now big challenges lay ahead. To speed up, I decided to use the microZed SOM board as the embedded controller, at least for the prototype. Thus only the power supplies and connecting PCB had to be designed. The Zynq-7010 will configure the sensor with its SPI MIO from the ARM PS side. The data will be received at the FPGA programming logic (PL) and somehow relay to the PS DDR3 memory. The data can then undergo complex calibration and save to SD card or transfered over GbE/USB.

microZed

The microZed SOM with 1GB DDR3 and various I/O

The board is then designed and fabricated with the 754 CPU socket mounting the sensor. The main PCB contains the voltage regulators, oscillator and temperature sensing circuits.

Main_PCB

Stack-up

The data lines go through a relay board, which also provides power to Zynq PL I/O banks. The whole stack is then tripled checked before applying power. After weeks of hardware and software debugging, the sensor was finally configured and running at designated frame rate. Now it’s time to work on verilog in order to receive the data. I’m going to cover that in my next part.

Modify a 754 CPU socket for image sensor

Recently I’m working on a camera design using a CMOS image sensor. The problem is the sensor was built into a custom uPGA package (1.27mm pin pitch) with a lot of pins. The actual socket built for this sensor is very expensive. And it requires force to mount and tools to dismount. Luckily, a lot of old CPU sockets use the similar uPGA standard and they are ZIF (Zero Insert Force) sockets. The problem is just too many extra pins you don’t want. A straightforward way is to cut off every unused pin on the package but could risk bending an unintended pin in the process. And most often the cutting process is not clean to the edge, preventing a flat and smooth fit on the surface of PCB.

After a bit investigation, I found the upper movable lid (white color) can be easily disassembled by gently prying the side rail. This will expose the bottom part holding all the ZIF contacts and the pins in each square hole.

 

The lid holding the CPU

The releas e/lock handle

Bottom part with tons of ZIF contacts

Now with a jumper cable pin (2.54mm pitch male end), you could easily pop out all the unwanted ones, leaving a custom ZIF socket!

And here’s the actual ZIF contact.

The ZIF acceptor assembly is inserted from bottom and locked inside the square hole. The CPU pin comes down from above. Once the locking handle rotates, all the pins are carried and pushed towards the narrower slot between the 2 U-shaped hairpins to establish the electric contact.

The Dark Side of Image Sensors

An ideal image sensor is always desired in astronomical imaging, which has negligible read noise and dark current, with almost 100% quantum efficiency. That means your SNR is only limited by your background sky noise and exposure time. The reality is, most of the time, far from being ideal.

In the last decade, the fierce competition for consumer camera market share and the investment into R&D perfected the CMOS sensor to a point far exceeding the performance of CCDs. Here let’s looked at the imperfections left in some of these famous CMOS sensors.

Recently I ported the “Dark Current Enable Tool” to the 3rd generation Nikon DSLRs. This made it possible to evaluate the image sensor more accurately and conveniently.

Uneven Bias Level

In the imaging pipeline, the first calibration step is subtraction of master bias frame. This will even up the pedestal for all the pixels before any scaling step. The master bias is usually created by combining dozens of bias frames. Now just by looking at the master bias we can check how good the image sensor and read out circuit are made.

D700-ISO200

Portion of bias frame in NC81338L (D3, D700) at ISO200

The above bias frame is from D700. Due to its use of 6 discrete AD9974 dual channel ADCs from analog readouts, the 12 column per cycle of uneven bias level is clearly visible even without frequency analysis. The Sony sensors, like the IMX028 in D3X, employed the column parallel ADC. The column-wise irregularity is much smaller due to the voltage comparators share a common ramping DAC reference. There should also be calibration circuits before readout to cancel the differences between these column circuits.

D3X

Bias pattern in D3X at ISO 1600 and ISO 100

However, I do notice a weird global pattern in the IMX028 bias frame. At first I thought it’s due to a light leak. But I quickly ruled it out since it decreases with increasing ISO and is absent in 2 Bayer channels. It cannot be a low frequency noise in power circuit either as the same pattern is observed in individual bias frames and different ISO. Could this be individual sensor issue? I don’t know. But it’s not observed in any other camera I had data access to.

Read Noise

Most calibration program only outputs a master bias frame. During this process, read noise for each pixel can be calculated by sample standard deviation without much effort. (In statistics, sample SD is a biased estimator, but none the less, consistent if sample size is large)

ReadNoise

Read noise in D700, D3X, D5100 and D800

It is clear how the sensor design had an effect on the read noise distribution. Unlike the 4T stand alone pixel design in NC81338L, all Sony CMOS inside Nikon used 2.5T shared pixel. That’s what give you the pairs of noisy pixels.

Photo Response Non-Uniformity

PRNU is very difficult to estimate unless a light source more uniform than the sensor response itself is used. Some form of PRNU, such as stitching artifact is more apparent.

Stitching artifact in IMX028

These artifacts are caused by multiexposure photolithography. Since the stepper does not have an imaging circle large enough to cover a full frame image sensor die, it has to be stitched like making a panorama. IMX028 has only one seam while the NC81338L has 3 due to a much smaller mask used.

Astrophotography in pure darkness

In Michigan, I could only see one nebula – “Michigan Nebula”. Nah, that’s just a joke in the amateur astronomy society here to complain about the frequency of cloudy nights in the state. For me, the complaint is real. I do not have an observatory for regular imaging. Packing such a heavy weight EQ mount and going to some dark rural site only to find cloud building up is almost frustrating and unacceptable. Now it seems a road trip every half year could offer me better opportunity with the best dark sites in the States.

So here are some examples. During the Christmas of 2013, I went to the Big Bend National Park in Texas. There’s absolutely no light pollution from almost any direction except some desert town outside the park. Terrain should perfectly shade these local glares.

At the dusk we entered the park, but from where we were staying took about 1 hour drive. The surrounding lost its colorful appearance when the last patch of sky became completely black. The headlight of our vehicle and the passing by prevent us from dark adaptation. But when we step out of the car, the brilliant zodiac light immediately catches my attention. It was so bright, even under the streetlight in a parking lot, I could see it reaching 30 degrees high in the sky. The clouds kept me blinded for 1 day and half. It was until the third night that I could view it in its full majesty. Until midnight that day, the zodiac light was still bright on the horizon.

Zodiac Light

Zodiac Light

This time, all the clouds move away to the west and it offered a clear night for astrophotography. I picked a spot near the park entrance to setup my tracking rig, and another camera for time lapse. The Orion’s belt was my imaging priority. In a 2 hour and 40 minutes total exposure, I was able to reveal all the dark nebula and dust bands adjacent to the bright M42 and horsehead.

Orion

Meanwhile, the sunset at Rio Grande Village was considered by us to be the most scenic combination after 3 days of lonely drive in desert.

Rio_grande

360 panorama – Sunset of Rio Grande

Now 6 months have passed, another opportunity took me to the Mojave Desert in California. This time, I’ve substitute the glass inside the optical glass inside with one having antireflection coating. Thus all the glare surrounding bright stars and nebula center are gone. About 10 minutes’ drive away from the small desert town Baker, I set up my AstroTrac on the sandy road of Mojave National Preserve. It was dry hot at such a low altitude. Besides the intermittent wind blowing against you, is the occasional sound from some unknown animal sheltering in the wasteland. The glare from Baker and head light of passing cars on I-15 are on my north, the Rho Ophiuchi Nebula is a perfect target. Yet under this dry heat, it was exhaustive trying to sleep inside a car. I manage to get 100 minutes of exposure in total.

Rho Oph

The Rho Ophiuchi Cloud Complex

This time I’m using the hacked firmware preserving the raw output from the sensor. Now with custom made calibration pipeline developed, I could achieve perfect preprocessing before the actual alignment and stacking.

Meteor and Milky Way

An occasional meteor captured during the time lapse at the same night. The Rho Ophiuchi gradually sets into light dome from southern California as my TT-320X tracking it. The background light would still impact the SNR in the dark nebula.

Some 360 panoramas along the way, click to pan and zoom.

image

Devil's Postpile

image

At monolake, I took a panorama of the sky. But it seems more challenging to process. The sky was divided into 7 areas each 4 subframes. Airglow greatly increases the sky background near horizon that night.

Monolake