CMOS Camera – P5: Ethernet Liveview

To make camera control easier, I spent the last several weeks making a control scheme based on Ethernet. The camera will be a server with LWIP tasks running on a freeRTOS operating system. The client will be my computer of any OS platform. The only thing connects the two will be a 1G Ethernet cable. To speed things up, the client demo program is written in python3.


Client application based on TKinter

Once the RTOS is boot up, a core task will set up the network and instantiate a listening port. On the client side, all control commands are sent through TCP protocol once connection is established. On the application layer, there’s really not much protocol going here. I chose to decode the command using a magic code followed by actual command id. Four commands are established so far:

1. Send Setting

2. Start Capture (RTOS will create the CMOS run task)

3. Halt Capture

4. Send Image

Once TCP handshake is done, client could send 1 and 2 to begin video capture with defined setting. During this time, command 4 will retrieve the latest image to decode and display on GUI. The camera setting includes exposure time and gain, frame definition and on chip binning, shutter mode and ADC depth, as well as many other readout related registers.

The image are transferred in RAW data, which is linear. Thus numpy functions become very helpful here to implement the level control and post readout binning. RAW image can be written to disk as RAW video given a fast enough I/O.

Several ongoing improvements are under progress. First and foremost is the Ethernet performance. In a direct point to point connection, there really should be reliability issue. And according to test, TCP could achieve ~75MB/s on a GigaETH. UDP will be even fast might need to with potential packet drop. But anyway, TCP will be able to handle 24FPS 1080P liveview. But both server and client needs optimization. Other issue includes file saving task on RTOS and better long exposure control.

Cooled CMOS Camera – P4: Lens Mount

Things have been going slowly recently. Instead of improving the image acquisition pipeline, I decided to apply some mechanical touch to make it more stable. The PCI-E connector is without a doubt, the weakest link for the entire structure. Also I need to actually make this a camera by mounting a lens on it, instead of just several pieces of PCBs.


3D Visualization with PCBs

Notice that the linkage of side plate consists of three slots instead of holes. This was designed for tuning the flange distance from the focal plane. Both PCBs are mounted on M3x0.5mm standoffs just like your motherboard in a computer case.


View through the lens mount

An EF macro extension tube is used to mounting the lens. The flange distance is approximately 44mm. The electrical contacts are left float for now. I attached a 50mm 1.8D lens using a mount adapter.

50mm Lens

First image this camera sees through my window.

Cooled CMOS Camera – P3: Image Quality

In the previous post I successfully obtained the test pattern with custom VDMA core. The next step will be to implement an operating system and software on host machine. In order to obtain real time live view and control, both software should be developed in parallel. Thus in this post, let’s take a look at the image quality with a simple baremetal application.

The sensor is capable for 10FPS 14Bit, 30FPS 12Bit, or 70FPS at 10bit ADC resolution. For astrophotography, 14bit provides the best setting for dynamic range and achieves unity gain at default setting. The sensor IR filter holder and the camera mounting plate are still in design. I will only provide a glimpse into some bias and dark images at this moment.

To facilitate dark current estimation, the cover glass protective tape was glued to a piece of cardboard. The whole sensor was then shielded from light with metal can lid. Lastly, the camera assembly was placed inside a box and exposed to -15°C winter temperature. During the process, my camera would continuously acquire 2min dark frames for 2 hours, followed by 50 bias frames.

Bias Hist

Pixel Intensity distribution for a 2×4 repeating block (Magenta, Green, Blue for odd rows)

The above distribution reflects a RAW bias frame. It appears each readout bank has different bias voltage in its construction. The readout banks assignment is a 2 rows by 4 columns repeating pattern, each color for each channel. A spike in the histogram at certain interval implies a scaling factor is applied to odd rows post-digitalization to correct for uneven gain between top and bottom ADCs.

Read Noise Distribution

Read Noise – Mode 3.12 Median 4.13 Mean 4.81

The read noise distribution is obtained by taking standard deviation among 50 bias frames for each pixel. Then I plot the above distribution to look at the mode, median and mean. The result is much better compared to a typical CCD.


Finally the dark current in a series of 2-minute exposures is measured by subtracting master bias frame. Two interesting observations: 1. The density plot gets sharper (taller, narrower) as temperature decreases corresponding to even lower dark generation rate at colder temperature. 2. The bias is drifting with respect to temperature. This could be in my voltage regulator or in the sensor, or a combination of two.

The bias drift is usually compensated internally by the clamping circuit prior to ADC. But I had to turn this calibration off due to a specific issue with this particular sensor design. I will elaborate more in a later post. Thus to measure dark generation rate, I have to use FWHM of the noise distribution and compare against that in a bias frame. At temperature stabilization, FWHM was registered at 8.774, while a corrected bias is 8.415 e-. For a Gaussian distribution, FWHM is 2.3548 of sigma. Thus the variance for the accumulated dark current is 1.113 given the independent noise source. As such, the dark generation rate at this temperature is less than 0.01 eps. Excellent!

Preliminary Summary

The sensor performs well in terms of noise. For long exposure, the dark generation rate in this CMOS is more sensitive to temperature change than CCDs. The dark current is massively reduced when cooled below freezing point. The doubling temperature is below 5°C.


An uncorrected dark frame after 120s exposure showing visible column bias and hot pixels

The Making of a Cooled CMOS Camera – P2

In the last post, I uncovered a bug in the Vivado implementation which accidently removes the DIFF_TERM from my input buffer. With that problem solved, I picked up the project again with a goal to achieve high speed imaging. Now I’m going to cover the design principal and its intermediate steps to achieve it.


End Result – My customized VDMA IP highlighted

The end goal is to have an unified, resource-efficient and high-performance VDMA to receive, decode, and transfer data to an external memory. In this case, it will be the DDR3 on PS side. The screenshot above looks very satisfactory and concise. But in reality, any error in along its data path will complicates the entire debugging process. Thus I decided to tackle them one module at a time from the upstream, only combining them in the final run.

Step 1

The initial step is to make sure all receiving banks are decoding the SOL/EOL sync code properly. The stock ILA IP could be attached to the immediate downstream of the receiving module to verify their function. The ILA requires a free running clock. In this case, I had to let the sensor run in high power and continuous mode to feed a constant clock into FPGA.

Step 2

External FIFO

Intermediate Design: Receiver feeding multiple external AXI-S Data FIFOs

After verifying that all 4 banks are functioning, the next step is to receive the data with integrity. Due to inter-bank skew, each pixel from a different bank could arrive at a slightly different clock period. And the source synchronizing clocks are not aligned among banks. Asynchronous FIFO is the answer. Each bank will feed a FIFO at its own source clock. But all 4 banks are read out under a single internal clock when all FIFOs have available data. The AXI-Stream Infrastructure IPs contains a Data FIFO for AXI4-S protocol. Its TReady on the slave port will not go down before FIFO is full. To read off from all 4 FIFOs in an aligned manner, all data must be valid.

assign Ready = (Data_valid_0_in & Data_valid_1_in) & (Data_valid_2_in & Data_valid_3_in);

Step 3

The above method was tested under 2015.2. It will no longer work in the most recent version for some unknown reasons. This drives me to use the FIFO_DUAL_CLOCK_MACRO provided by Xilinx. It is essentially a BRAM with built in hardware asynchronous FIFO support on 7 Series devices. The read enable signal must be de-asserted in the same cycle when any of their Empty signal rises.

assign RDEN_all = !EMPTY_0 & !EMPTY_1 & !EMPTY_2 & !EMPTY_3;

Also, the FIFO must be reset for 5 period on both reading and writing clock before use. Thus I implemented a resetting handshake mechanism to enable reset as soon as the source clock starts running. This will give plenty time before actual data arrives. The downstream goes through an AXI-S Data FIFO buffer before the AXI-DMA.

 Internal FIFO

Final Step

So far so good, I’ve got a complete image. But many areas should be improved. First of, the pixels are not organized. Each transfer of 64 bits are 2 pixels from even row and 2 from odd rows. This column-wise de-interlacing simply costs too much for an ARM CPU to do. Secondly, at least a quarter of the bandwidth is wasted at 12bit ADC or lower. Under the 150MHz timing constrain of AXI-DMA. this severely hinders high frame rate operation. Thus I decided to use an interleave transfer mechanism coupled with bit packing. To use interleave transfer, address generation and switching is essential to jump between even and odd rows. As for bit packing, it’s a common trick used in many uncompressed RAW formats (DNG, Nikon NEF). In this instance, I will use 3 bytes 2 pixel packing at 10/12bit ADC.

The most complicated part is 4K boundary check required by AXI protocol. Consider the following code block:

If (Target_addr[11:3] + Burst_count > 4096)

m_axi_s2mm_awlen <= ((Target_addr | 12’hFF8) – Target_addr) >> 3;

else m_axi_s2mm_awlen <= Burst_count – 1;

This is doomed to fail at high frequency due to large number subtraction. The critical path length will be doubled given the bit width of the target address. The above step can be split into pipeline fashion. First checking if condition and set a flag FDRE. In the next cycle, use the flag to direct calculation. The pipeline latency is high in my implementation, totaling at 5 clock cycle assuming constant AWReady state. However, considering that the address can be generated in advanced while the last data burst is in transfer, the actual latency can be ignored.

Eventually, the design is successfully constrained under 200MHz AXI-HP clock. After hardware validation, the actual bandwidth for a 4K@60FPS video will be a whopping of 1.2GB/s given 1.6GB/s available.


Implementation Map: AXI-mem-interconnect; AXI-GP-interconnect; Peripheral Resets

Receiver and Sync Detector; Interleave FIFO Control; S2MM Channel Core; AXI-GP Slave Control Registers

The synthesis and implementation run is only 3 minutes combined, given only 5 core Verilog files within my VDMA IP. The rest is just 2 AXI-interconnects and peripheral resets. The resource utilization is only 11% for LUT and less for FDREs in a 7010 device, leaving much space for future use.

Test pattern looks perfect! In my next post, I’ll showcase some actual images.


Beware of DIFF_TERM

The Vivado tool chain might be wrong occasionally. And it screwed me up big time!

A bit of background where problem occurred

For serial data reception, this is usually straightforward due to each channel is independent from the others. A simple IDELAYE2 could sweep through all possible delay tap values and feed the data to an ISERDESE2 for deserialization. The final value should be set at the mid-point where deserialization is most reliable. But for parallel data, all bits in a word must be perfectly deskewed and aligned! Assume 8 bit data pairs has 16 taps each to scan through, then the combination would be at an astronomically value of 168 ! A workaround is used to call in a second IDELAYE2 to independently validate the result from the primary data path. This would make deskew independent.

How problem occurred

The implemented design works perfectly on hardware on each LVDS word channel. I combined them later on but did not connects the deskew feedback into downstream logic for partial validation. This is where Vivado gets “smart”. It optimized out all my unused logic including the secondary or slave IDELAYE2. But during the process, it “optimized” away the DIFF_TERM from the IBUFDS_DIFF_OUT! This severely impacted the signal integrity in some bit lines, which in turns, making my data reception completely fail.

The following Tcl command can be used to interrogate if all your IBUFDS has DIFF_TERM:

get_property DIFF_TERM [get_cells -hier -regexp {.*IBUFDS_inst}]

If all return is 1, then DIFF_TERM is set to True.


Interfacing Nikon CMOS module with ZYNQ SoC

In my last post I showed a CMOS camera in progress. This time I’m going to deviate from that topic a little bit by interfacing the image sensor module from D5100, the mighty IMX071.

On the same relay board that serves as the carrier card for the microZed, I intentionally included a flex connector for the Nikon module. The connector contains 8 data pairs and one accompanying clock signal, all as sub-LVDS signaling standard. The rest of pins are mostly power switches, SPI for configuration and synchronization signals.


The same connector on the Nikon D5100 motherboard

The great thing about FPGA is its versatile I/O standards. On the ZYNQ fabric side, each IO bank can hosts multiple I/O standards as long as their voltages are the same. Here I combined LVDS25 and LVCMOS25 for control into the IO bank 35. The LVDS25 is required to enable 100 Ohm differential ODT (On Die Termination). A simplified block design is shown below. A 2.5V to 1.8V logic shifter was omitted.


The LVDS clock signal drives the entire logic fabric, which includes 2 major components. First, a signal receiver/decoder that writes data into a FIFO. The the AXI-DMA will transfer the data from FIFO into the DDR3 memory on the processor side through the AXI-HP port. Secondly, a sensor driver responsible for generating the correct line synchronization pulses for the sensor. The driver is configured through the AXI-GP port by the program running on PS.

All things connected, we have microZed stacked on top of my relay board, a D5100 CMOS module on top right and its DCDC power board on bottom right. Upon power up, the logic will load the bitstream configuration. Once this is done the program running on ZYNQ ARM processor will configure the system, interrupts and PS IO port. Then the sensor will light up followed by the register setting sequence. The final step is the actual synchronization driving and DMA run. After data acquisition is completed, the program writes the image from RAM to SD card.


A decoded test image (lens not attached) between vertical blanking regions

We are currently designing the full product! I’ll keep posted!

The Making of a Cooled CMOS Camera – P1

As my last post had suggested, I was working on a camera design. Right now the “prototype”, as I would call it, is in the test phase. The project actually dates back to 3 years ago when we envisioned a large focal area CCD imager customized for deep sky astrophotography. At that time, the price for such a commercialized camera was so prohibitive. The most suitable monochromatic chip was the interline KAL-11002 with a size of 36 x 24mm^2. Unlike full frame CCD which necessitates a mechanical shutter for exposure control, interline could handles this electronically. However, the addition of a shielded VCCD region greatly impacts the quantum efficiency and full well capacity. Beyond that, Kodak CCDs don’t seem to recover QE well enough with microlenses, with peak at 50% and only 30% for 650nm on a B/W device. Later on we started to dig deep into the datasheet and soon we abandoned the project. The accumulated dark current in VCCD was simply too much at the slow readout speed required for decent level of read noise.


The KAL-11002ABA in the original plan

What happened next was dramatic. After getting my hands on D7000 and the hacking, I was shocked by how good CMOS sensor performs. I soon realized the era for CCD in astronomy might come to an end. Sooner or later, it will too embrace the noiseless CMOS in the telescopes. When Kodak span off its imaging division to Truesense, it soon re leased its first CMOS sensor with sub 4e- read noise and CCD-like dark current. We decided to give it a try.


Got the sensor, now big challenges lay ahead. To speed up, I decided to use the microZed SOM board as the embedded controller, at least for the prototype. Thus only the power supplies and connecting PCB had to be designed. The Zynq-7010 will configure the sensor with its SPI MIO from the ARM PS side. The data will be received at the FPGA programming logic (PL) and somehow relay to the PS DDR3 memory. The data can then undergo complex calibration and save to SD card or transfered over GbE/USB.


The microZed SOM with 1GB DDR3 and various I/O

The board is then designed and fabricated with the 754 CPU socket mounting the sensor. The main PCB contains the voltage regulators, oscillator and temperature sensing circuits.



The data lines go through a relay board, which also provides power to Zynq PL I/O banks. The whole stack is then tripled checked before applying power. After weeks of hardware and software debugging, the sensor was finally configured and running at designated frame rate. Now it’s time to work on verilog in order to receive the data. I’m going to cover that in my next part.