National Park Time Lapse – Tranquility

Since my last astrophotography road trip in California two and half years ago, I really haven’t spent anytime writing on travel and photography. Amidst the camera project and my PhD, I somehow have accumulated a pile of decent photography yet to be processed or released. But anyway, all those hard work serves to produce better images wouldn’t they. So I took a break in the previous weeks to finish off some of those leftover photo work.

Please enjoy my second time lapse compilation – Tranquility

Included are some of the time lapses I took in Big Bend NP, Mojave National Preserve, Death Valley, Picture Rock Lakeshore and Shenandoah NP. Then there’s also the Jiuzhai Valley in Sichuan, China!

In terms of astrophotography, I only got a few left in hard drive for release. The road trips I cover recently were on the east coast. With light pollution and bad weather along the way, there really weren’t many stars to be seen. Let alone for deep space imaging.


Wide Field Milky Way Center shot in Death Valley

As for 360 panorama, it becomes a routine for me now as the pipeline for 3×6 stitching is well established. In the meantime I start to incorparate the floor image in the stitching process.

Carlsbad CavernsThe WindowWhite SandBig BendPorcupine MountainTybee Island LighthouseShenandoahDeath Valley

Mouse over for location, Click for 360 View

The link to my first time laspe compilation is here:

Full speed ahead – My new generic VDMA

In 2016 when I build my KAC-12040 camera, I wasn’t satisfied with the Xilinx VDMA IP. It closes timing only at 150MHz. It neither supports arbitrary size for a compressed stream. So I wrote my own DMA engine to exploit the full bandwidth of AXI-HP port on 7-series devices. I had managed to close timing at 200MHz at 64bit. Back then when my carrier card only supported 4 LVDS banks from that sensor, this bandwidth was more than enough for 1280×720 RAW stream at 600 FPS.

But to achieve this I had to overclock the LVDS transmitters. This led to stability issues on my engineering grade sensor at high frequencies. To circumvent I implemented sync error detector and dynamically drop bad frames. This solution is perfect at moderate overclocking. But as frequency approaches 200% the drops are so severe that the gain is meaningless. As a result, I couldn’t push further for higher horizontal resolution.

In 2017 I ruled the KAC out from my astro-imaging candidate due to image quality concerns and available of better alternative. At this point on I decided to unleash its full potential for a pure high speed camera. This would require all 8 banks of LVDS signal into 7010 FPGA. It was done by routing only a single LVDS clock into MRCC pin and dynamically figures out the phase relationship for each data banks. Additionally I can discard MSB pins on later LVDS banks since 12/14bit ADC readout is slow enough using only the first two or four banks. This strategy freed enough IOs for bank 4~7 during high speed operation.

Layout without length matching, all eye diagram and phase are decided at run time



Soldered PCB with a socket


This bandwidth requires higher AXI clock frequency. The limit set by 7-series AXI port is 250MHz. It’s time that I completely rewrite my VDMA. The timing closure poses a significant challenge now, especially when we are dealing with Artix-7 C-1 fabric. But some careful analysis on my previous design exposed the critical paths to improve:

1. The routing length into the hardened ARM processor is very long and bus width 64 can easily cause routing congestions. The solution, avoid additional combinatory logic on the data path. Have a register with its Q directly go into the ARM processor. Or another way, prepare the write data and have a FIFO BRAM as the buffer, and read enable pin as the release control. I chose the second option as it’s more elegant. By setting internal DO_REG = 1 the FIFO will have one more cycle of latency but significantly improved timing.

TData/TLast directly go into ARM processor without interconnect

2. The AXI interconnect is not that good. The additional logic converting burst into 16 wastes logic. Thus I parameterized the C_BURST_SIZE and AWLEN bits correspondingly. When I can set this to 16, the port now conforms to AXI3 on Zynq 7 ARM processor and the entire AXI interconnect optimized away at IP integration.

3. Scan TLast and count burst as stream comes in and issue address write accordingly. Pipeline the logic as needed and insert double buffer (skid buffer) when necessary.

The result is quite satisfactory. 328 Flip-flops and 276 look up tables. Timing closure is effortless now.

In the meantime, I rewrote my bit concatenator IP that convert arbitrary bit length into 64bit. Length of each burst is in TUser. This IP would cope with compressed stream and dynamically changed bit depth during sensor operation.


Now I almost replace this block in every camera. With some modification on the LVDS receiver in this one, I can now stream 16Gbps per second. At extended width, 3600 x 720 can stably stream at 600FPS!

This speed could also cope with the insane data rate of SLVS-EC sensors. Once decoded to 8bit, 8 channels yields 1.85GB/s.

No datasheet, No FAE, No problem! – The proper way to hack Nikon D850

Two years ago we identified the sensor inside Nikon D850 with ChipMod lab. There’s plenty justification for this sensor in astronomy. It’s the first back-illuminated full-frame CMOS mass produced. It is very fast. It supports various movie resolutions up to 4K30P and 720P at 120FPS. It also has electronic first curtain and fast enough scan rate to enable a fully silent rolling image capture mode. The chip packaging is also very compact with a single connector. And a metal frame directly attached to the sensor itself enable quick thermal dissipation. Nothing is more perfect than these.

This article will be technical in many aspects and serves as a documentary of how we approach such problem in R/D process. Let’s roll!

Initial probing and speculation

Usually the first step we do is to separate the power rails from the signal buses. Usually these power traces are thick and connects to multiple pins to reduce resistance and impedance. This allows high current to flow through. Also many have immediate capacitors to reduce ripples. And at the same time we could find the ground pins using a multimeter, as well as the control pins connected to each power regulators.

Four thick traces ending with big electrolytic capacitors are power rails

Next we can search the connector components with some basic measurements such as pin pitch, count and connector type. This one is clearly a mezzanine connector with a middle slot. DigiKey and Mouser can quickly filter down from their vast inventory down to just tens of few components. Then it’s really just reading datasheets and to see which one match.

The flex cable tells more information. There’re 17 differential lanes. This usually means there’s one functions as a clock among 16 others for data. This extra clock lane tells no embedded clock like PCI-E or USB3 is needed and the speed should be low enough for most cost-sensitive FPGAs.

The other traces are power enable and sensor controls. Judging from typical large Sony sensors, this one is probably again SPI running in HD VD slave driving mode.

Using a sniffer board

This time Nikon designed the connector well, much too well that I can flip the flex cable and put the sensor above the main PCB. I decided to build a stacking female-male board for data logging.

This board is just a passive pass-through for most signal traces. The control buses can be tapped.

Flipping the flex cable around and exposing the sensor for easy connection

Logic analysis

As expected, this sensor shares the common SPI protocol just like IMX071 and IMX094, except with increased functionality requires 16-bit address space for more registers. In the still capture mode, line period is around 11us. This gives a whopping 15FPS in 14-bit mode, truly amazing! From here I can make some rough deduction on the clock frequency and data rate.

Because each data line contains 8256 pixels at least, this distributes 7224 bits over each lane. The minimal required frequency is 657Mbps. The supplying clock frequency is 72MHz to the sensor internal PLL. The closest multiple should be 720MHz internal and 360MHz DDR clock.

720Mbps should be OK for most FPGA I/O banks. Great!

For detailed SPI protocol, I wrote some bash and python scripts to automatically compare the setting between different modes. ISO, shutter, digital gain and region of interest are clearly defined. For electronic first curtain, IMX309AQJ appears to use internal register setting to drive the charge reset scan. This contrasts with Sony A-series with an external pulse.

Power sequencing

Another important aspect is the power sequencing. Correct supply voltage and power on timing is essential. Some mixed signal ICs even requires strict power sequencing just to prevent frying the circuit. Logic analyzer is capable of capture such sequence but capacitor charging and discharging will delay the action edge a lot. Thus in most cases a scope is desired.

Digital signal rises quickly but power rails slowly charges decoupling capacitors

There are some rails can sleep during long exposure while digital circuits are inactive. It’s beneficial to log their behavior and relative timing to other control signals as well. In all there are six voltage rails supplying this sensor module, among which three are low voltage at 1.2V for digital logic and high speed interfaces.

Layout of carrier card

With information almost complete, I can layout a carrier card to include necessary power regulators and bridge the high speed signals into FPGA. Only a single I/O bank is needed.

Carrier card in black solder mask

The carrier card is designed with the IMX309AQJ sensor mechanically centered relative to microZed SoM module. Due to board to board stack height and room constraints, most regulators are placed on the back side. I wrote a simple power sequencing logic mimicking the D850’s and verified all voltages are correct. With sensor attached and first power on, I breathe a sigh of relief. Nothing went in smoke, great!

Driving the sensor and verify the clock frequency

For fast bring up, I could duplicate a configuration register setting from liveview where sensor is free running. This could drive the high speed clock continuously for frequency measurement. There’re ways to do this without a high performing scope with only digital counting logic.

reg [15:0] freq_counter = 0;
reg toggle_sync = 0, toggle_sync1, toggle_sync2;
reg [15:0] freq_counter_prev;
always @(posedge clk_div) begin
    toggle_sync1 <= toggle_sync;
    toggle_sync2 <= toggle_sync1;

    freq_counter <= freq_counter + 1;
    if (toggle_sync2 != toggle_sync1) begin
        freq_counter <= ‘h0;
        freq_counter_prev <= freq_counter;

reg [9:0] standard_counter = 0;
reg [15:0] freq_counter_sync;
always @(posedge s00_axi_aclk) begin
    standard_counter <= standard_counter + 1;
    freq_counter_sync <= freq_counter_prev;
    if (standard_counter == 999) begin
        standard_counter <= 0;
        toggle_sync <= ~toggle_sync;

Frequency measurement logic

The idea is to latch the tick count at certain absolute timing intervals. The s00_axi_aclk is a standard 100MHz clock and after 1000 ticks (10us) we signal across the clock domain to latch the tick counter in target clock and also reset it. Since direct measurement can pose timing constrain issues, we need BUFR to divide it down to clk_div.

The frequency matches my guess to be 360MHz. This is a DDR clock sampling both at rising and falling edges.

Sensor stack on carrier card

Getting data with ISERDES and DMA

I do not know the relative phase relationship between this clock and data lanes. So I migrated my Dynamic Phase Alignment algorithm from KAC camera to this one. This enables the IOB to scan the transition edges for eye diagram and sample at the best possible location.

There are 16 lanes. I used a 1:4 deserialization ISERDESE2 to construct a 64-bit AXI stream to feed DMA. This data can then later be analyzed as a binary file.

Sync sequence and data format

Most large format Sony sensor uses a bespoke synchronization sequence other than SAV/EAV defined by ITU. From xxd dumped of DMA binary file, I found this sensor to be no different.

First line – FFF 000 FFF 000 FFF

Other lines – FFF 000 FFF 000 000

There are no end of line sequences. Thus counters must be used in conjunction with SOL sequence detector state machine.

Viewing properly formatted image stream

With these information ready, I can now implement line and row counters for a proper VDMA. The same DMA IP I built for KAC-12040 is used to provide 1.6GB/s data rate. Properly formatted images can then be transferred into memory.

As usual, with everything in shape, I implemented the freeRTOS system to handle Ethernet command and control. Video streams are based on UDP packets. On typical POSIX system like macOS and Linux, there will be no packet drops with a direct 1G connection.

Right now the lens mount is Canon EOS. Unfortunately I do not have a adapter nor EF lens with me. Will update real images later.

Line skipping and video formats

Before thinking of functionalities, I first need to figure out operation modes. This can be done with register setting comparisons. Some simple python scripting enables such comparison.

All still modes are in 14-bit ADC readout including complete silent mode. To speed up in video, it has to sacrifice ADC resolution and readout lines. In total, I found 4 different driving modes. All video modes are 12-bit readout.

Liveview base/1920×1080/1280×720 FX/DX 60 – 24 FPS

1280×720 FX/DX 120/100FPS

4K FX 30 – 24 FPS

4K DX / Liveview Zoom mode

The first two modes are full sensor area readout with horizontal binning and vertical line skipping (subsampling). The DX 4K and magnification is 1:1 windowed readout and I can see no color aliasing. The FX 4K mode scans a larger area. This mode appears to use vertical binning instead of skipping for better quality.

Additional functionalities

With the above analysis, I can isolate the registers responsible for ISO analog gain, digital gain, exposure time and window cropping. Some of these function can be applied to other modes where they are not enabled. I combined 14-bit readout with window cropping so in some SNR critical scenarios data in a small region can be realized.


Partial readout with lots of vertical blanking

A lot more awaits discovery!


Update 9/16

I played around with various movie modes and here’s some updates on the imaging sizes.

In 4K high resolution DX movie mode is running on 5520 x 3070 with 7.5us per row readout. In the FX mode this is 8352 x 2328 on 9.34us per row. There are additional 22 rows each ahead for bias calibration. This 4K is the end result of downsizing from these imaging areas. The DX mode readout area is roughly 16:9 in the APS-C crop region of this sensor. This sensor can perform 14bit ADC in 11us. At 12bit readout, most single slope ADC runs at one quarter speed. The line rate is limited not by ADC but by how fast it sends data. Limiting horizontal region is a perfect solution here. In FX readout mode the aspect ratio is doubled, closed to 32:9. If we read all 4656 lines the frame rate can only achieve 23FPS. What Sony probably did was they run ADC twice on alternative lines and vector added the value before sending off. Then the ISP downsizes on X-axis.

In the 1080P mode the resolution happens to be 1/3 in each direction, 2784 x 1854. This mode is similar to IMX071 we seen before, the sensor binned the horizontal pixels internally and then skip two rows for each row read.

Dark sky backyard of Michigan

In 2016 I got my AZ-EQ6. Without a proper imaging set, I did mostly visual observation with my 150 Rumak. In 2017 when the 70SA Gen II was released, I immediately acquired one. This is a compact quadruplet astrograph offering a very wide field of view at F5. Usually scopes this small only corrects for cropped sensor but this one does 135 full frame image circle.

Two SkyRover 70SA with different front AR coating

So in 2017 I restarted my astrophotography journey in June. Because Ann Arbor is so close to Detroit Metropolitan area, light pollution is a problem. Even if I drive 20 minutes to the west to the university’s radio telescope facility, light dome from the Jackson on southwest is still an issue. The best place without going to the north is a small state park called Lake Hudson close to the Ohio State borderline.

Lake Hudson is 50 miles away in one hour drive

My first stop in the sky was the Trifid and Lagoon nebula. This trial exposes some unforeseen issues I hadn’t thought of. For one, the focuser sag gradually drift my camera out of focus as the telescope tracked across the sky. I had initially focused the camera during mount alignment. After target repointing I left the focus unchecked. Composition was another lesson. In pitch darkness, it was very difficult to tell the sky outline from the optical viewfinder. Getting familiar with the position of reference star beforehand would save vast amount of time for actual imaging.

Lagoon and Trifid

Lagoon and Trifid Nebula

During the solar eclipse road trip, I’m getting more familiar with the setup. And yes of course, that deserved a whole new article. But I’ve been too busy and still not satisfied with the data processing. Our team had gotten a treasure trove of data. On the return journey, I took the NGC7000 North America Nebula in the Badlands National Park. It was a perfect spot, absolutely no light surrounding us. It’s probably one of the best dark sites in contiguous United States.

4 hours of exposure at the darkest side yields perfect details

NGC7000 / North America Nebula

One great thing about AZ-EQ6 is you can do two scope side by side

Before the winter storm crawls in, Michigan actually gets a few clear night every new moon phase. The problem to have these over the weekend. Thus I would still pursuit the clear night when I didn’t have a meeting next day. In September I returned for the Cygnus heart and then the Heart nebula after midnight.

The exposure was too short when the clouds rolled in. This is definitely worth a retry and I should also include the soul nebula. For the Cygnus heart region, I did a three pane mosaic and the data still requires processing.


Moving to October, IC1396 Elephant’s Trunk was overhead. At first I confused this target with the Rosette Nebula. It is a much larger region with star nursery in the center. My APS-C sensor could barely cover the entire region. Due to the light dome on the east and the emission nature of this nebula, I kept the CLS filter in. A three-hour exposure of this region revealed lots of detail, let the image speak for itself.


IC1396 Elephant’s Trunk in 3 hours

In December I took the Triangulum Galaxy. The frigid weather made the battery life extremely short. But at the same time there was almost no dark current on the sensor. An hour long exposure left a clean background.

Triangulum Galaxy

In 2018 I also attempted the awesome eye of NGC7293. Now this object is highest in August but still due to the latitude of Michigan it’s close to the horizon and can be affected by light dome.


Another similar planetary nebula is the M57. But this one is so small I have to use the 150/1200 Rumak for help.


At F12 the Ring nebula is still bright. However the hydrogen outer shell requires magnitudes of more exposure.

Before I left, the last target was the M16 Eagle Nebula. For this wide region, Milky Way mixes with the HII emission nebula. Use of CLS filter made color tuning a lot difficult. The entire galactic plane is broadband spectrum.


M16 Eagle Nebula

Now I need to find another spot amongst the light pollution in bay area. The sky is clearer but the city light grows significantly. A year has passed and I haven’t entirely settled down. So I took out my AstroTrac TT320X again. Far away from city light, the Mojave Desert is truly dark!


Check out my 4+ hours long Barnard’s Loop on my AstroBin

Cheaper yet powerful camera solutions

It’s been a while since my last blog post. During this past year, I’ve built a few other cameras yet released on this blog. In the meantime, I have been looking into options to make this work available to fellow amateur astronomers as a viable product. One major blocker here is the cost. FPGAs are expensive devices due to two factors: 1. They are less produced compared to ASIC and still uses state of art silicon process. 2. Massive area dedicated to routing and configuration logic. Let’s look at a simple comparison. The MicroZed board I’m using cost $200 with dual Cortex-A9 core clocking at 666MHz. This contrasts with quad core Ras Pi 3B clocking at doubling frequency. And it only cost $30.

However, using these single board computer SoC devices are not free from challenges. Most scientific CMOS sensors do not output data using standard MIPI CSI2 interfaces and require a FPGA fabric to do the conversion. Beyond that, we also need to choose a SoC that has CSI2 interfaces supporting high enough total bandwidth. Then to take functionality into consideration, it’d be preferable to enable edge computing/storage and provide internet hosting in a single solution. In the end, we conclude the next generation should have the following connectivity.

1. 1000Base-T Ethernet and built-in WiFi support

2. USB3.0 in type-C connector

3. Fast storage with PCI-E NVME SSD

Besides these, the device should be open enough with Technical Reference Manual (TRM) and driver source code available for its various IP blocks. Ras Pi clearly drops out due to limited CSI2 bandwidth and absence of fast I/O. After length and careful comparison, I landed on Rockchip RK3399. It has dual CSI2 providing a total 1.5GB/s bandwidth and powerful hex A72/A53 cores running above 1.5GHz for any processing. One platform from friendlyArm NanoPC-T4 board is the most compact among all 3399 dev kits. This board also has IO interfaces aligning on one edge making case design straightforward. It is vastly cheaper compared to Zynq MPSoC with similar I/O connectivity.

NanoPC T4

Two MIPI CSI2 connector on the right

Now the rest is to provide a cheap FPGA bridge between the sensor and CSI2 interface. The difficult part is of course the 1.5Gbs of MIPI CSI2 transmitter. On datasheet, the 7 series HR bank OSERDES is rated at 1250Mbs. But like any other chip vendor, Xilinx down rate the I/O with some conserved margin. It’s been shown before that these I/O can be toggled safely at 1.5Gbs for 1080P60 HDMI operation. But still, that is TMDS33 with a much larger swing compared to LVDS/SLVS for MIPI D-PHY. To test this out, I put a compatible connector on the last carrier card design using extra I/Os. Because D-PHY is a mix I/O standard running on the same wire, only the latest Ultrascale Plus supports it natively. To combine both low power single ended LVCMOS12 and high-speed differential SLVS using cheap 7 Series FPGA, we must add an external resistor network according to Figure 10 in Xilinx XAPP894.

PCB resistor network with some rework

It is possible though, to merge all LP positive and negative line respectively to save some I/O if we are only using high-speed differential signaling. In this case, tying these LP will toggle all four lanes into HS mode simultaneously. The resistor divider ratio has also been changed because I need to share with LVDS25 signals from CMOS sensor in the same HR bank.

To produce an image, I wrote a test pattern generator to produce a simple ramp up value pixel by pixel in each line. Every next frame the starting value will increase by four. Timing closure was done at 190MHz for the AXI stream. This prevents FIFO underrun at 1.5Gbs at four lanes. I then took the stock OV13850 camera as mimicking target. A simple bare metal application runs on PS7. This app listens for I2C command interrupt, configures the MMCM clocking, sets image size and blanking and enables the core.

Finally, some non-trivial changes need to be done on the RK3399 side to receive correctly. After lengthy driver code review, I found two places requires change. First, the lane frequency setting in the driver. This eventually populates the a V4L2 struct that affect HS settling timing between LP and HS transition. Second, the device tree contains the entry for number of lanes used for this sensor.

MicroZed stack on top of NanoPC T4. Jumper cable are I2C

There’s a mode to disable all ISP function to get RAW data. This proves extremely helpful to verify data integrity. In the end, we won’t need ISP for astronomical imaging anyway.

Timing of low power toggle plus HS settle costs 21% overhead

Rolling ramp TPG wraps around through HDMI screen preview

This work paves the way for our ongoing full fledge adapter board. Stay tuned for more information soon!

Phase AF CCD Die Shot

Back in 2014 we were investigating the AF/Lens system at NikonHacker. To understand the operation of phase AF, some efforts were put into the AF sensor itself. There were leaked D1X schematics indicating 3 linear CCDs made by Sony (ILX105 and ILX107) are incorporated into the MultiCAM-1300. In the old days, a single chip could not handle that many segments of linear pixels on a single die, so that the light path had to be split and focused onto multiple chips. The same is done on MultiCAM-2000 which uses 3 chips as well.

Then from the D200 until D90, a single chip ILX148 is used to handle all 11 focus points in the new CAM-1000 AF system. Some teardown serves as great resources even showing a die photo of that sensor. Missing in between was the D70’s CAM-900. Later I came across a cheap working sensor stripped from a broken D70 and decided to take a look.



The entire module came in with dust, clearly from a broken camera fall onto the ground. I tore the 2 duct tap covering the slit between the chip and plastic optical assembly. The opening is a metal mask outlying the light transmission boundaries of 5 focus points.

Then I use a knife to peer off the glue on the sides, exposing the reddish epoxy adhesion between the chip carrier and the optical module. A gentle pull separated them apart.

The Sensor

Sensor Die

Now the AF CCD is exposed! You could see a total of 12 linear CCD segments forming 6 pairs.

Let’s look at the back side of the optical assembly to understand why.


It appears each focus point has a pair of microlenses. The center cross-type use 2 linear segments in perpendicular, thus 4 lenslets. That gives you total of 6 pairs.

To illustrate how this works, I cover the focus plane with a scratching paper and point its front toward a light bulb. And here’s the image.


The pattern matches the layout of linear CCD.

Now we could mimic a high contrast target by covering 2 focus points in half with a sticker.

You could see the 2 lenslets forms a copy of 2 high-contrast edges in the 2 segments.

When this is relayed from a photographic lens, the distance between the 2 high-contrast edges will vary depending on the defocus value. The firmware A then uses some sort of cross-correlation algorithm to determine that distance. The distance is then compared against the calibrated value to get the actual defocus amount used to drive the lens AF motor.

So far that’s for the working principle for the phase AF optics. There’s a lot need to dig into the ASM codes of firmware A, and the electronic interface between the AF CCD and the MCU running the codes. Here I decided to desolder the CCD from the flex board. The CCD is packaged inside a CLCC and the contacts form a L-shape covering both side and bottom. It turned out the heat from soldering iron disassociates the wires from flexible board before melting the solder on the bottom. All the contact pads on the flexible board are destroyed.

The backside of the CLCC package has following marking.


It’s a Sony ILX127AA linear CCD.  405 R9KK is the product batch code. “405” indicates it’s made in the 5th week of 2004, around the time of D70 and D70s.

The schematics can be obtained from wiring trace. In the diagram below, VREF is probably 3.3V based on the trace. SD0~3 and STB formed a simple parallel command interface. CLK is the master clock input. The analog output of pixel intensity is on Vout synchronized to SYNC.


Now we could dig into the image sensor die using a microscope. I took more than 50 shot and stitched using a panoramic software. The CCD was manufactured using a very old process node, probably larger than 1 micron.


Click for Large View

The charge transfer is based on 2-phase CCD. The total number of pixels is around 996. Considering the metal masked pixels, this number reduces to 912. Thus MultiCam-900 make sense. The greenish regions are the actual photodiodes. The photon generated charge is then transferred to the shaded region on the left or to the top. The charge is then clocked and shifted out to the output amplifier. The three long segments are continuous with dummy pixels in between two correlated pixel regions. The six shorter ones form the left, center and right focus points are broken into two due to the long segments. Thus each shorter one has its own amplifier. The CCD integrates all the input command decoder/segment select/CCD driver logic on chip, as indicated by the vertical grid of synthesized transistors and their metal interconnect wires.

CMOS Camera – P7: Streaming Lossless RAW Compression

Now this post will be for some serious stuff involving video compression. Early this year I decided to make a lossless compression IP core for my camera in case one day I make it for video. And because it’s for video, the compression has to be stream operable and real time. That means, you cannot save it to DDR ram and do random lookup during compression. JPEG needs to at least buffer 8 rows as it does compression on 8×8 blocks. Other complex algorithm such as H264 requires even larger transient memory for inter frame look up. Most of these lossy compression cores consume a lot of logic resource which my cheap Zynq 7010 doesn’t have, or not up to the performance when fitting into a small device. Also I would prefer lossless than lossy video stream.

There’s an algorithm every RAW image format uses but rarely implemented in common image format. NEF, CR2, DNG, you name it. It’s the Lossless JPEG defined in 1993. The process is very simple: use the neighboring pixels’ intensity to predict the current pixel you’d like to encode. In another word, let’s record the difference instead of the full intensity. It’s so simple yet powerful (up to 50% size reduction) because most of our images are continuous tone or lack high spatial frequency details. This method is called Differential pulse-code modulation (DPCM). A Huffman code is then attached in front to record the number of digits.

The block design

Sounds easy huh? But once I decided to get it parallel and high speed, the implementation will be very hard. All the later bits have to be shifted correctly for a contiguous bit stream. Timing is especially of concern when the number of potential bits gets large when data is running in high parallel. So I smash the process into 8 pipeline stages in locked steps. 6 pixels are processed simultaneously at each clock cycle. At 12 bit, the worst possible bit length will be 144. That is 12 for Huffman and 12 for differential code each. The result needs to go into a 64bit bus by concatenating with the leftover bits from the previous clock cycle. A double buffer is inserted between the concatenator and compressor. FIFOs are employed up and downstream of the compression core to relieve pressure on the AXI data bus.

Now resource usage is high!

By optimizing some control RTL, the core is happily confined at 200MHz now. Thus theoretically, it could easily process at a insane rate of 1.2GPixel/sec, although this sensor would not stably do it with four LVDS banks. When properly modified, it could suit other sensor which does block based parallel read out. For resource usage, a lot of the LUTs cannot be placed in the same slice as Flip-Flops. Splitting the bit shifter into multiple pipeline stages would definitely reduced the LUT usage and improve timing. But generally the FFs will shoots up to match the number of LUT thus the overall slice usage will probably be identical.

During the test, I used the Zynq core to setup the Huffman look up table. The tree can be modified between the frames so the optimal compression will be realized based on the temporal scene using a statistics block I built in.

Now I just verified the bit stream to decompress correctly using DNG/dcraw/libraw converter. The only addition is a file header and bunch of zeros following 0xFF in compliance with JPEG stream format.


The Whitney Challenge

I don’t prefer physical challenges. Yet the Mt. Whitney stood there a very special place. It’s the highest point in contiguous United States. Beyond that, it’s also one of few accessible through a trail. This makes it a huge difference comparing to, say the highest point in China – Everest, where professional mountaineering skill is essential plus huge bucks.

I’ve been tempting to summit this mountain for a while. So back in 2014 when planning a road trip in California, I decided to stop near the lone pine for a distant look on the mountain and get more detailed information. So here it is, a map from National Geography and a picture.



The center is actually the line pine peak, Whitney is right under the flag pole

It wasn’t until I got home that I found out the image was the wrong target. And the Whitney of course is so far way. This gives you a glimpse on the length of trail, a whopping 22 miles round trip! I gradually realized this wasn’t an easy task. Each year, some people got injured along the way or worse, lost their lives here. The complication is partly due to the altitude sickness after exertion. Other times, lack of buddy to overlook each other when accident happens.

In 2016, I met Weichen, a post-doc and an avid amateur mountaineer here in Michigan. This was the perfect chance to accept this challenge. Another big push to me was so much was going on with my life that I just want to forget. In May this year we didn’t get the lottery for an overnight permit. A slot in late July was the only option for two people.

So began our training, I picked a staircase with elevator next to it. And every other day, we set off a 99 floors of simulated hike with weight. Then the next day ran for cardio. The gradual push made me comfortable with even more weights. Three days before we left for Whitney, we peaked at 11 kilometer.

We packed in all gears needed for the hike. For your reference, head light and sufficient battery is critical as you don’t want to get lost in the middle of night or fall off cliff! Trekking poles can be helpful crossing the creek and on the way down. Lastly, don’t forget to bring food, candy and water. On the website it suggests 3 liters per person per day and that really is the minimal. There is water supply along the way if you bring your filtration device. But if not, bring plenty water!

The Great Sand Dunes

Monument Valley

Navajo Mountain and Colorado-San Juan River Junction

Lake Powell, Page, Glen Canyon Dam, Horseshoe Bend and Antelope Canyon

Grand Canyon

At that time, flight to Las Vegas was the most affordable. We arrived the previous day at noon. After getting the car and filling ourselves full with buffet, we headed toward California through the Death Valley National Park.

Badwater Basin


Stovepipe Wells and Sunset

By the time we got to lone pine, it was already dark. We retrieved the permit from a small locker next to the visitor center. Then we immediately checked into motel just to get enough sleep for the hike next day.

Day use permit at Whitney

Sky was clear in high Sierra early next day. We left the motel at 3:30AM before dawn. There was 20 minutes’ drive from lone pine to the Whiney Portal. From there we started the actual hike at 4:15. Just after the third switchback, we encountered the first trouble at the north fork of lone pine creek. Water level was unusually high that made us to trek it through with bare foot. The standing rocks were mostly submerged.

Moon and Venus, looking back down the valley

The trail stayed on one side of mountain for the next half hour of hike until you hit the Lone Pine Lake. The first crossing has tree log bridges. After a while, we hit another submerged section without any bridge. For once more, we took off our boots to cross it.


Me crossing the log bridge near Lone Pine lake

In the next hour, we saw the first sign of snow. The unusual heavy precipitation last winter at California has left so much snow to melt. And of course, the flooding alone the Whitney trail.

Daybreak above the Lone Pine Lake

Alone the route the trees recedes rapidly. After passing the outpost campground below the mirror lake, landscape transforms into barren rock faces. Occasionally, there’s patches of flower scatter between the melting snow.

The traverse is at the top of this huge snow patch

Then we encountered the first challenge. A part of trail completely covered in snow and we don’t have any ice climbing equipment as we didn’t expect this so late in the season. For each step forward, we made sure the trekking pole is deep solid in the snow and the other foot wasn’t slipping at all. I was scared since this was my first time attempting something so steep on ice.

The red arrow indicates the traverse covered in ice

At 10AM we finally arrived at the trail camp upstream of Consultation Lake. As we watched the huge massif and icy slope, chances of making to top at noon faded away in a flash. Altitude had already taken effect on my body when I saw my fingernails starts turning purple. We spent 15 minutes recovering ourselves with water and candy. Then we put on the sunscreen before march ahead to the endless 99-switchbacks that will quickly rise from 12,000 feet to trail crest at 13,600.

Looking back down at the Consultation Lake (right) and the Trail Camp Pond (left)

The last switchback, traversing a 40 degree ice slope

Had I succumbed to the previous scary icy traverse, then I probably wouldn’t have to suffer this near heart stopping endeavor at 13500 feet. After the last switchback, we had to traverse two more continuous patches of melting snow just before the trail crest. OMG! A 500 meter 40 degree slope is waiting for your slip at any single step. I tried not looking down. But at every step, I have to just to make sure I secured my foot at the right place.

Exhausted at the trail crest

This little guy wants my food!

At 4100 meters, I started to feel the pain in my stomach. I know I was exhausted, so was my buddy. But there was no appetite to eat. The weird feeling of thin air here is taking a toll! I essentially forced the lunch into my mouth. With the projected timing, we will at most arrive after 2PM. We dropped some of our backup supplies at the John Muir Trail Junction and we pushed on.

The Hitchcock Lakes in the Sequoia National Park

The final 2 miles to the summit was a completely different experience. Here traversing across the west flank of peaks leading to Whitney, it was dry and hot unlike the icy north slope of the valley we trekked up. For this portion, we were left with a single bottle of water for both of us. So we had to preserve as much as we can.

A final few steps toward the victory!

Oh I forgot, we brought two cans of beer! Cheers!

Finally, we made it at 2:20PM. And cheers! We brought two cans of beer, but only for photography. I sip once cause I knew alcohol would make you dehydrate even quickly. So we just left the other can inside the hut. We log our name into history and then for some serious view!

The highest 360 panorama in contiguous United State

(click for 360 link)

I took a 360 panorama on the top of Whitney, with love. There was absolutely no time to shoot a time lapse. We headed down immediately.

The needles next to the last 2 miles up

Started our way down at 3PM

At the trail crest, I felt we probably would not make it to the top. Now I had a sensation of dying due to dehydration before reaching the trail crest. Half way down, we consumed the last drop of water. It was a race against time until we dehydrate or make it back to the trail junction. There we have 1 more liter for both of us.

Some of the trail section is dangerously close the cliff edge

But to get more water, we would have to make it back down to 12000 feet at the trail camp, where most campers were. So I still have to think about that icy traverse again. We saw some daring climbers slid down the icy slope. That was really a dangerous move given melting snow with low friction and exposed rocks.

As the trail camp got closer, the hope of surviving finally overtook the fear of death. We met someone alone the route who kindly gave us filtered water. The fresh melting ice water was cold, yet so refreshing! The sun already falls behind the massif. We would have to made it faster to get back to the Whitney Portal before it’s dark again.

We skid down the frozen ice of lone pine creek

To conserve time, we follow the fresh trace of others alone the ice on the lone pine creek. We stayed away from the rocks where cracks most often formed. But in the end, there’s a moment we had to get back on the trail when getting close to the snow line. Sunset happened by the time we got close to the Mirror Lake. We got back onto the trail, turned on our head lamps and continued down the mountain.

We followed the river on the way down. GPS record

The flooding was even worse after a day of melting snow washing down in torrent. But this time, we trekked right across the creek. The cold ice actually made a beneficial effect like pain killer to our heating and swelling ankle.

The milky way came out, for the first time I’ve ever seen in high Sierra. It was so clear. With the calm air at high altitude, those stars barely twinkled! I decided to sit down for a five minute rest, just enjoy the sound of nature from waterfalls, insects, birds and the starry nights through the gaps between those tall pine trees.

We finally got back to the portal. Both cellphones battery went dry. Buddy’s GPS watch also stopped recording between the Mirror Lake and the Lone Pine Lake. It was fulfilling feeling after such an achievement. 22miles/36km round trip in snow icy condition without proper gear. It was close to midnight when we checked into hotel. I took a shower and fell asleep immediately when I touched bed. My leg still hurt the second day. That morning, I finished three plates of breakfast while watching sun light shine on the Mt. Whitney.

Sun set behind the Sierra mountain range at Owen Lake

We left the lone pine before sunset. I really need to treat myself with star gazing. Next stop was the Dante’s View over watching the Badwater Basin of Death Valley.

Along the high way, shimmering lights filled the distant void of desert. I wondered, would I one day stand on the top of that peak again or I might never wish to challenge myself with some like this. But one thing was certain, I had just left my lingering love on the top. Good night, Mt. Whitney!

The Milky Way Arch over Badwater Basin

(Click for 360)

Written on Oct 15, 2017

CMOS Camera – P6: First Light

In July I finally got the UV/IR cut filter for this camera. I designed a simple filter rack and 3D printed it. The whole thing now fits together nicely in front of the sensor. IR cut is necessary due to a huge proportion of light pollution in the near-infrared spectrum.

Filter rack

UV/IR cut taped to the plastic rack.

With all the hardware in place, I added a single trigger exposure mode in the camera firmware. And accordingly a protocol command to issue a release on the PC software.


The camera is then attached to a SkyRover 70SA astrograph. In the camera angle adjuster, there’s a 12nm bandwidth Ha filter. This would allow me to easily reject light pollution while imaging in front of my house. Focusing through the Ha filter is extremely difficult. I chose a bright star and pulled the exposure time to maximum during liveview for focusing. Finally, before the battery pack went dry (supplying both AZ-EQ6 mount and my camera), I managed to obtain 15 frames with 5 minutes each.


No dark frame was used for the first light image and guiding performance was exceptional. This foiled the kappa sigma algorithm for hot pixel removal and makes the background very noisy. Anyway, NGC7000 already shows rich details!


1. This sensor has higher dark current than Sony CMOS. Somewhat >4 folds more at the same temperature. However, doubling temperature is small. In another word, its dark current reduces quickly with cooling. Last time I observed no dark noise at –15C. Thus imaging the horsehead during winter would be brilliant here in Michigan!

2. Power issue. The sensor consumes ~110mA @5V during long integration comparing to ~400mA for continuous readout, which is minimal. However, the Zynq SoC + Ethernet PHY consumes much more than a full running CMOS sensor. Thus some power saving technique can be employed. CPU throttling during long integration/standby, powering down the fabric during standby mode, move the bulk of RTOS to OCM instead of using DDR, etc. But many of these require substantial work.


Anyway, I’m going to use this during the solar eclipse here in USA!

CMOS Camera – P5: Ethernet Liveview

To make camera control easier, I spent the last several weeks making a control scheme based on Ethernet. The camera will be a server with LWIP tasks running on a freeRTOS operating system. The client will be my computer of any OS platform. The only thing connects the two will be a 1G Ethernet cable. To speed things up, the client demo program is written in python3.


Client application based on TKinter

Once the RTOS is boot up, a core task will set up the network and instantiate a listening port. On the client side, all control commands are sent through TCP protocol once connection is established. On the application layer, there’s really not much protocol going here. I chose to decode the command using a magic code followed by actual command id. Four commands are established so far:

1. Send Setting

2. Start Capture (RTOS will create the CMOS run task)

3. Halt Capture

4. Send Image

Once TCP handshake is done, client could send 1 and 2 to begin video capture with defined setting. During this time, command 4 will retrieve the latest image to decode and display on GUI. The camera setting includes exposure time and gain, frame definition and on chip binning, shutter mode and ADC depth, as well as many other readout related registers.

The image are transferred in RAW data, which is linear. Thus numpy functions become very helpful here to implement the level control and post readout binning. RAW image can be written to disk as RAW video given a fast enough I/O.

Several ongoing improvements are under progress. First and foremost is the Ethernet performance. In a direct point to point connection, there really should be reliability issue. And according to test, TCP could achieve ~75MB/s on a GigaETH. UDP will be even fast might need to with potential packet drop. But anyway, TCP will be able to handle 24FPS 1080P liveview. But both server and client needs optimization. Other issue includes file saving task on RTOS and better long exposure control.

Update 6/24

Some updates on the board operation system.

1. By modifying on socket API, I incorporated  the zero copy mode of TCP operation. Thus pointer to data memory is passed directly to EMAC task and no stack memcpy is involved. This provides a 15% bandwidth gain under TCP operation. Top speed is around 70MB/s for payload.

2. I added in an interrupt event on SDIO driver to avoid polling the status register. Thus IO will not waste CPU cycle and the single core can perform EMAC listening task. As a result, SD file I/O can be performed simultaneously along the video liveview. 

Microscopic survey of recent image sensors

Last year through cooperation with ChipMod studio, we obtained multiple die shot images of recent Sony sensors. And in this post we’re going to show some of them. Most of our device identification is based on teardown from various reliable source, such as Chipworks and manufacture repair manual. Or from direct microscopic imaging. For inference, it has to be relied on die signature such as number of bond pad and their relative location, or referred as “bond pad signature”.


Let’s begin. The first one, IMX038AQE from Pentax K-x/K-r. It’s the same silicon die as the AQQ variant seen in Nikon D90 and D5000 DSLR.

SONY and device marking code IMX038

Layer number during photolithography of Bayer pattern and on chip lens (OCL)

Factory die level testing left probe scratch on the test pad

Next, let’s take a look at the IMX071AQQ from D5100.

No device marking was found on the die except “SONY”

Bayer layer mark. PAC appears to be Photo Activated Chemical based on patents

Factory test pads

Finally we have the IMX094AQP from D800/D800E. The first image shows the alignment mark near the die boundary. It’s interesting that Nikon customized the cover glass to be a quartz anti-moiré layer. As advertised by Nikon, both D800 and E variant included the vertical separation glass. The glass appeared to be specially AR coated only in the image area, not on the whole plate level. We had never seen this on other Sony sensor, not even on IMX128.

Alignment marks shows duplicated image in vertical direction

Edge of the multilayer AR coating shows uneven gradient

Similar to 071, Sony did not imprint the device marking in the corner. However, I found a pair of mask number related to this device. MM094L and MM094R on the long edge of the silicon die. This pairs of mark appears on Sony full frame sensors only. We later found the pair on IMX235 and IMX128 as well. Based on their location, I realized that it could be a mask code for a stitching pair. A full frame sensor was just too big to fit in the circle of stepper imaging field of view. Thus to make a full sensor, a pair of mask has to be used just like taking your panorama. This was the case for IMX028 when I discovered the non-uniformity on its flat field image.

The microscope I had access to has a 40x objective. However its working distance is too short to prevent direct imaging through the sensor cover glass. With the permission and request by ChipMod studio, I’ll show some more enlarged image onto the pixels themselves.

One interesting sensor was the X-pro1 CMOS harboring a Sony marking code. Again no actual device code.

Xpro-1 IMX165

Xpro-1 IMX165

The corner of Fujifilm X-trans array

Through the central opening on the back of PCB, the package marking register X165A?. The second character is presumably a R or P or F.  It’s possibly IMX165AFE based on IC searching where many distributer had an entry on their listing. Sony usually used the second letter to denote Bayer type, with Q for RGB Bayer and L for mono. F would naturally mean a different pattern like X-trans. The die itself, appears to be the same as the 16MP IMX095 found in Sony NEX F3 and Pentax K-01.

Fujifilm CMOS PCB


Pentax K-01 uses CLCC version IMX095AQE

It’s possible that Sony fixed the underlying circuit, only altering the last few steps in their back end of line (BEOL) to pattern a different color filter array. This would significantly reduces cost by avoiding making a new sensor. So the question is, when will we see a native monochromatic CMOS in APS-C or larger format?

Next we will have a big one, the IMX235AQR in the Sony A7S, which harbors a 12MP full frame at around 8.5um pixel pitch. ChipMod obtained the following image during mono chip mode. In essence, scraping away the microlens and Bayer layer. The pixel opening is super wide given 55% area fill factor on the metal three layer.

50x objective view of the Metal 3 layer after Bayer removal


The microlens array appears to shift towards top left of pixel boundary

We also surveyed the IMX183 BSI sensor. Surprisingly, BSI sensor also has a grid on the light sensitive side. After some literature search, the presence of this grid could reduce color crosstalk between adjacent pixels. This is because on BSI sensor light can easily pass to the collecting well in the next pixel when fill factor gets larger and incident angle gets smaller. It is also the reason to employ microlens array to focus light rays on to the pixel center.


IMX183 BSI pixel boundary grid

At the end, we take a look at the old school interline CCDs. ICX413 in Pentax K-100.

And ICX493 using rotated horizontal-vertical transfer registers.


ICX493 employed four phase CCD, with two pixels covering a period. Thus readout is interlined. Charge on odd and even columns are transferred upward then right or downward and left to their respective HCCD (organized vertically) on each side for read out. Then the same is repeated for interline rows.