CMOS Camera – P6: First Light

In July I finally got the UV/IR cut filter for this camera. I designed a simple filter rack and 3D printed it. The whole thing now fits together nicely in front of the sensor. IR cut is necessary due to a huge proportion of light pollution in the near-infrared spectrum.

Filter rack

UV/IR cut taped to the plastic rack.

With all the hardware in place, I added a single trigger exposure mode in the camera firmware. And accordingly a protocol command to issue a release on the PC software.

70SA

The camera is then attached to a SkyRover 70SA astrograph. In the camera angle adjuster, there’s a 12nm bandwidth Ha filter. This would allow me to easily reject light pollution while imaging in front of my house. Focusing through the Ha filter is extremely difficult. I chose a bright star and pulled the exposure time to maximum during liveview for focusing. Finally, before the battery pack went dry (supplying both AZ-EQ6 mount and my camera), I managed to obtain 15 frames with 5 minutes each.

NGC7000

No dark frame was used for the first light image and guiding performance was exceptional. This foiled the kappa sigma algorithm for hot pixel removal and makes the background very noisy. Anyway, NGC7000 already shows rich details!

Remarks

1. This sensor has higher dark current than Sony CMOS. Somewhat >4 folds more at the same temperature. However, doubling temperature is small. In another word, its dark current reduces quickly with cooling. Last time I observed no dark noise at –15C. Thus imaging the horsehead during winter would be brilliant here in Michigan!

2. Power issue. The sensor consumes ~110mA @5V during long integration comparing to ~400mA for continuous readout, which is minimal. However, the Zynq SoC + Ethernet PHY consumes much more than a full running CMOS sensor. Thus some power saving technique can be employed. CPU throttling during long integration/standby, powering down the fabric during standby mode, move the bulk of RTOS to OCM instead of using DDR, etc. But many of these require substantial work.

 

Anyway, I’m going to use this during the solar eclipse here in USA!

CMOS Camera – P5: Ethernet Liveview

To make camera control easier, I spent the last several weeks making a control scheme based on Ethernet. The camera will be a server with LWIP tasks running on a freeRTOS operating system. The client will be my computer of any OS platform. The only thing connects the two will be a 1G Ethernet cable. To speed things up, the client demo program is written in python3.

image

Client application based on TKinter

Once the RTOS is boot up, a core task will set up the network and instantiate a listening port. On the client side, all control commands are sent through TCP protocol once connection is established. On the application layer, there’s really not much protocol going here. I chose to decode the command using a magic code followed by actual command id. Four commands are established so far:

1. Send Setting

2. Start Capture (RTOS will create the CMOS run task)

3. Halt Capture

4. Send Image

Once TCP handshake is done, client could send 1 and 2 to begin video capture with defined setting. During this time, command 4 will retrieve the latest image to decode and display on GUI. The camera setting includes exposure time and gain, frame definition and on chip binning, shutter mode and ADC depth, as well as many other readout related registers.

The image are transferred in RAW data, which is linear. Thus numpy functions become very helpful here to implement the level control and post readout binning. RAW image can be written to disk as RAW video given a fast enough I/O.

Several ongoing improvements are under progress. First and foremost is the Ethernet performance. In a direct point to point connection, there really should be reliability issue. And according to test, TCP could achieve ~75MB/s on a GigaETH. UDP will be even fast might need to with potential packet drop. But anyway, TCP will be able to handle 24FPS 1080P liveview. But both server and client needs optimization. Other issue includes file saving task on RTOS and better long exposure control.

Update 6/24

Some updates on the board operation system.

1. By modifying on socket API, I incorporated  the zero copy mode of TCP operation. Thus pointer to data memory is passed directly to EMAC task and no stack memcpy is involved. This provides a 15% bandwidth gain under TCP operation. Top speed is around 70MB/s for payload.

2. I added in an interrupt event on SDIO driver to avoid polling the status register. Thus IO will not waste CPU cycle and the single core can perform EMAC listening task. As a result, SD file I/O can be performed simultaneously along the video liveview. 

Cooled CMOS Camera – P4: Lens Mount

Things have been going slowly recently. Instead of improving the image acquisition pipeline, I decided to apply some mechanical touch to make it more stable. The PCI-E connector is without a doubt, the weakest link for the entire structure. Also I need to actually make this a camera by mounting a lens on it, instead of just several pieces of PCBs.

Drawing_1Drawing_2

3D Visualization with PCBs

Notice that the linkage of side plate consists of three slots instead of holes. This was designed for tuning the flange distance from the focal plane. Both PCBs are mounted on M3x0.5mm standoffs just like your motherboard in a computer case.

ASM_BackASM_FrontMount

View through the lens mount

An EF macro extension tube is used to mounting the lens. The flange distance is approximately 44mm. The electrical contacts are left float for now. I attached a 50mm 1.8D lens using a mount adapter.

50mm Lens

First image this camera sees through my window.

Cooled CMOS Camera – P3: Image Quality

In the previous post I successfully obtained the test pattern with custom VDMA core. The next step will be to implement an operating system and software on host machine. In order to obtain real time live view and control, both software should be developed in parallel. Thus in this post, let’s take a look at the image quality with a simple baremetal application.

The sensor is capable for 10FPS 14Bit, 30FPS 12Bit, or 70FPS at 10bit ADC resolution. For astrophotography, 14bit provides the best setting for dynamic range and achieves unity gain at default setting. The sensor IR filter holder and the camera mounting plate are still in design. I will only provide a glimpse into some bias and dark images at this moment.

To facilitate dark current estimation, the cover glass protective tape was glued to a piece of cardboard. The whole sensor was then shielded from light with metal can lid. Lastly, the camera assembly was placed inside a box and exposed to -15°C winter temperature. During the process, my camera would continuously acquire 2min dark frames for 2 hours, followed by 50 bias frames.

Bias Hist

Pixel Intensity distribution for a 2×4 repeating block (Magenta, Green, Blue for odd rows)

The above distribution reflects a RAW bias frame. It appears each readout bank has different bias voltage in its construction. The readout banks assignment is a 2 rows by 4 columns repeating pattern, each color for each channel. A spike in the histogram at certain interval implies a scaling factor is applied to odd rows post-digitalization to correct for uneven gain between top and bottom ADCs.

Read Noise Distribution

Read Noise – Mode 3.12 Median 4.13 Mean 4.81

The read noise distribution is obtained by taking standard deviation among 50 bias frames for each pixel. Then I plot the above distribution to look at the mode, median and mean. The result is much better compared to a typical CCD.

Dark_current_minus_15

Finally the dark current in a series of 2-minute exposures is measured by subtracting master bias frame. Two interesting observations: 1. The density plot gets sharper (taller, narrower) as temperature decreases corresponding to even lower dark generation rate at colder temperature. 2. The bias is drifting with respect to temperature. This could be in my voltage regulator or in the sensor, or a combination of two.

The bias drift is usually compensated internally by the clamping circuit prior to ADC. But I had to turn this calibration off due to a specific issue with this particular sensor design. I will elaborate more in a later post. Thus to measure dark generation rate, I have to use FWHM of the noise distribution and compare against that in a bias frame. At temperature stabilization, FWHM was registered at 8.774, while a corrected bias is 8.415 e-. For a Gaussian distribution, FWHM is 2.3548 of sigma. Thus the variance for the accumulated dark current is 1.113 given the independent noise source. As such, the dark generation rate at this temperature is less than 0.01 eps. Excellent!

Preliminary Summary

The sensor performs well in terms of noise. For long exposure, the dark generation rate in this CMOS is more sensitive to temperature change than CCDs. The dark current is massively reduced when cooled below freezing point. The doubling temperature is below 5°C.

LEXP_001

An uncorrected dark frame after 120s exposure showing visible column bias and hot pixels

The Making of a Cooled CMOS Camera – P2

In the last post, I uncovered a bug in the Vivado implementation which accidently removes the DIFF_TERM from my input buffer. With that problem solved, I picked up the project again with a goal to achieve high speed imaging. Now I’m going to cover the design principal and its intermediate steps to achieve it.

VDMA

End Result – My customized VDMA IP highlighted

The end goal is to have an unified, resource-efficient and high-performance VDMA to receive, decode, and transfer data to an external memory. In this case, it will be the DDR3 on PS side. The screenshot above looks very satisfactory and concise. But in reality, any error in along its data path will complicates the entire debugging process. Thus I decided to tackle them one module at a time from the upstream, only combining them in the final run.

Step 1

The initial step is to make sure all receiving banks are decoding the SOL/EOL sync code properly. The stock ILA IP could be attached to the immediate downstream of the receiving module to verify their function. The ILA requires a free running clock. In this case, I had to let the sensor run in high power and continuous mode to feed a constant clock into FPGA.

Step 2

External FIFO

Intermediate Design: Receiver feeding multiple external AXI-S Data FIFOs

After verifying that all 4 banks are functioning, the next step is to receive the data with integrity. Due to inter-bank skew, each pixel from a different bank could arrive at a slightly different clock period. And the source synchronizing clocks are not aligned among banks. Asynchronous FIFO is the answer. Each bank will feed a FIFO at its own source clock. But all 4 banks are read out under a single internal clock when all FIFOs have available data. The AXI-Stream Infrastructure IPs contains a Data FIFO for AXI4-S protocol. Its TReady on the slave port will not go down before FIFO is full. To read off from all 4 FIFOs in an aligned manner, all data must be valid.

assign Ready = (Data_valid_0_in & Data_valid_1_in) & (Data_valid_2_in & Data_valid_3_in);

Step 3

The above method was tested under 2015.2. It will no longer work in the most recent version for some unknown reasons. This drives me to use the FIFO_DUAL_CLOCK_MACRO provided by Xilinx. It is essentially a BRAM with built in hardware asynchronous FIFO support on 7 Series devices. The read enable signal must be de-asserted in the same cycle when any of their Empty signal rises.

assign RDEN_all = !EMPTY_0 & !EMPTY_1 & !EMPTY_2 & !EMPTY_3;

Also, the FIFO must be reset for 5 period on both reading and writing clock before use. Thus I implemented a resetting handshake mechanism to enable reset as soon as the source clock starts running. This will give plenty time before actual data arrives. The downstream goes through an AXI-S Data FIFO buffer before the AXI-DMA.

 Internal FIFO

Final Step

So far so good, I’ve got a complete image. But many areas should be improved. First of, the pixels are not organized. Each transfer of 64 bits are 2 pixels from even row and 2 from odd rows. This column-wise de-interlacing simply costs too much for an ARM CPU to do. Secondly, at least a quarter of the bandwidth is wasted at 12bit ADC or lower. Under the 150MHz timing constrain of AXI-DMA. this severely hinders high frame rate operation. Thus I decided to use an interleave transfer mechanism coupled with bit packing. To use interleave transfer, address generation and switching is essential to jump between even and odd rows. As for bit packing, it’s a common trick used in many uncompressed RAW formats (DNG, Nikon NEF). In this instance, I will use 3 bytes 2 pixel packing at 10/12bit ADC.

The most complicated part is 4K boundary check required by AXI protocol. Consider the following code block:

If (Target_addr[11:3] + Burst_count > 4096)

m_axi_s2mm_awlen <= ((Target_addr | 12’hFF8) – Target_addr) >> 3;

else m_axi_s2mm_awlen <= Burst_count – 1;

This is doomed to fail at high frequency due to large number subtraction. The critical path length will be doubled given the bit width of the target address. The above step can be split into pipeline fashion. First checking if condition and set a flag FDRE. In the next cycle, use the flag to direct calculation. The pipeline latency is high in my implementation, totaling at 5 clock cycle assuming constant AWReady state. However, considering that the address can be generated in advanced while the last data burst is in transfer, the actual latency can be ignored.

Eventually, the design is successfully constrained under 200MHz AXI-HP clock. After hardware validation, the actual bandwidth for a 4K@60FPS video will be a whopping of 1.2GB/s given 1.6GB/s available.

Implementation

Implementation Map: AXI-mem-interconnect; AXI-GP-interconnect; Peripheral Resets

Receiver and Sync Detector; Interleave FIFO Control; S2MM Channel Core; AXI-GP Slave Control Registers

The synthesis and implementation run is only 3 minutes combined, given only 5 core Verilog files within my VDMA IP. The rest is just 2 AXI-interconnects and peripheral resets. The resource utilization is only 11% for LUT and less for FDREs in a 7010 device, leaving much space for future use.

Test pattern looks perfect! In my next post, I’ll showcase some actual images.

2016/12/5

Beware of DIFF_TERM

The Vivado tool chain might be wrong occasionally. And it screwed me up big time!

A bit of background where problem occurred

For serial data reception, this is usually straightforward due to each channel is independent from the others. A simple IDELAYE2 could sweep through all possible delay tap values and feed the data to an ISERDESE2 for deserialization. The final value should be set at the mid-point where deserialization is most reliable. But for parallel data, all bits in a word must be perfectly deskewed and aligned! Assume 8 bit data pairs has 16 taps each to scan through, then the combination would be at an astronomically value of 168 ! A workaround is used to call in a second IDELAYE2 to independently validate the result from the primary data path. This would make deskew independent.

How problem occurred

The implemented design works perfectly on hardware on each LVDS word channel. I combined them later on but did not connects the deskew feedback into downstream logic for partial validation. This is where Vivado gets “smart”. It optimized out all my unused logic including the secondary or slave IDELAYE2. But during the process, it “optimized” away the DIFF_TERM from the IBUFDS_DIFF_OUT! This severely impacted the signal integrity in some bit lines, which in turns, making my data reception completely fail.

The following Tcl command can be used to interrogate if all your IBUFDS has DIFF_TERM:

get_property DIFF_TERM [get_cells -hier -regexp {.*IBUFDS_inst}]

If all return is 1, then DIFF_TERM is set to True.

 

Interfacing Nikon CMOS module with ZYNQ SoC

In my last post I showed a CMOS camera in progress. This time I’m going to deviate from that topic a little bit by interfacing the image sensor module from D5100, the mighty IMX071.

On the same relay board that serves as the carrier card for the microZed, I intentionally included a flex connector for the Nikon module. The connector contains 8 data pairs and one accompanying clock signal, all as sub-LVDS signaling standard. The rest of pins are mostly power switches, SPI for configuration and synchronization signals.

image

The same connector on the Nikon D5100 motherboard

The great thing about FPGA is its versatile I/O standards. On the ZYNQ fabric side, each IO bank can hosts multiple I/O standards as long as their voltages are the same. Here I combined LVDS25 and LVCMOS25 for control into the IO bank 35. The LVDS25 is required to enable 100 Ohm differential ODT (On Die Termination). A simplified block design is shown below. A 2.5V to 1.8V logic shifter was omitted.

Block_design_nikon

The LVDS clock signal drives the entire logic fabric, which includes 2 major components. First, a signal receiver/decoder that writes data into a FIFO. The the AXI-DMA will transfer the data from FIFO into the DDR3 memory on the processor side through the AXI-HP port. Secondly, a sensor driver responsible for generating the correct line synchronization pulses for the sensor. The driver is configured through the AXI-GP port by the program running on PS.

All things connected, we have microZed stacked on top of my relay board, a D5100 CMOS module on top right and its DCDC power board on bottom right. Upon power up, the logic will load the bitstream configuration. Once this is done the program running on ZYNQ ARM processor will configure the system, interrupts and PS IO port. Then the sensor will light up followed by the register setting sequence. The final step is the actual synchronization driving and DMA run. After data acquisition is completed, the program writes the image from RAM to SD card.

Decoded_image

A decoded test image (lens not attached) between vertical blanking regions

We are currently designing the full product! I’ll keep posted!