Integrated photonic encoder for low power and high-speed image processing

Wang, Xiao; Redding, Brandon; Karl, Nicholas; Long, Christopher; Zhu, Zheyuan; Skowronek, James; Pang, Shuo; Brady, David; Sarma, Raktim

doi:10.1038/s41467-024-48099-2

Download PDF

Article
Open access
Published: 27 May 2024

Integrated photonic encoder for low power and high-speed image processing

Nature Communications volume 15, Article number: 4510 (2024) Cite this article

3715 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Modern lens designs are capable of resolving greater than 10 gigapixels, while advances in camera frame-rate and hyperspectral imaging have made data acquisition rates of Terapixel/second a real possibility. The main bottlenecks preventing such high data-rate systems are power consumption and data storage. In this work, we show that analog photonic encoders could address this challenge, enabling high-speed image compression using orders-of-magnitude lower power than digital electronics. Our approach relies on a silicon-photonics front-end to compress raw image data, foregoing energy-intensive image conditioning and reducing data storage requirements. The compression scheme uses a passive disordered photonic structure to perform kernel-type random projections of the raw image data with minimal power consumption and low latency. A back-end neural network can then reconstruct the original images with structural similarity exceeding 90%. This scheme has the potential to process data streams exceeding Terapixel/second using less than 100 fJ/pixel, providing a path to ultra-high-resolution data and image acquisition systems.

Image sensing with multilayer nonlinear optical neural networks

Article 23 March 2023

Time-domain photonic image processor based on speckle projection and reservoir computing

Article Open access 14 September 2023

An on-chip photonic deep neural network for image classification

Article 01 June 2022

Introduction

From the invention of photography until the early 1990, the function of a camera was to record analog images. With the development of solid-state focal planes and digital coding, however, this function changed, and modern digital cameras act as transceivers that transform massively parallel optical data streams into serial coded electronic data that is processed 1 pixel at a time^1,2,3. While this paradigm shift introduced numerous advantages, the power consumption associated with electronic digital processing, along with limits on data transmission rate and storage capacity, are now the major bottlenecks limiting image data acquisition rates^1,2,3,4. In contrast, compact lens designs are capable of resolving greater than 10 gigapixels of transverse resolution^5,6, while advances in multimodal imaging systems capable of acquiring spectral, polarization, temporal, and range information could enable future imaging systems to acquire (Tera)10¹² pixels per second of data. However, the power consumption and resulting coding capacity of opto-electronic transceivers is one of the primary barriers to achieving such systems^1,2,3.

In current digital electronics-based image processing systems, electrical power consumption is proportional to the number of mathematical operations performed on each pixel¹. Conventional image signal processing (ISP) systems perform 100-1000 operations per pixel to first condition (e.g., compensate for pixel non-uniformity, hot-pixels, denoising etc.) and then compress the image data stream, resulting in a per pixel cost of 0.1-1 microjoule¹. Refs. ^1,2,3. show that in high-resolution gigapixel imaging systems operating at 30 frames/s, the image sensors draw 100 milliwatts/megapixel, whereas the backend digital image processing pipeline draws approximately 10X more power amounting to ~1000 milliwatts/megapixel. This implies that processing 1 terapixel of data would result in power consumption of around 1 megawatt, which is prohibitive for many applications. In addition, while image compression is required for most remote sensing applications, many of these pixel conditioning operations are performed at the front end regardless of whether they are necessary for a given application. Recently, different ISP approaches, such as blind sensor head compression were proposed to reduce the number of operations per pixel¹. Blind compression, in this context, can be understood as implementing the first layer or two of a deep neural network-based auto-encoder on the read-out data stream. While this approach substantially reduces the number of operations per pixel, the per pixel power cost still remains unacceptably high, at ~0.01-0.1 microjoule¹.

Future Terapixel/second imaging systems (e.g., imaging systems with (Giga)10⁹ -pixel resolution with frame rates of (Kilo) 10³ Hz or imaging systems with higher pixel resolution but slower frame rates) will require an alternative ISP approach with dramatically higher throughput and lower power consumption. In this work, we explore the potential for analog photonics to improve throughput and power consumption in an image acquisition pipeline after image formation on the focal plane array. This is in stark contrast to the significant body of work focused on image classification, inference, or compressed sensing of the raw scene information that operates by processing the scene data directly in the analog optical domain at the image acquisition stage^{4,7,8,9,10,11}. While both schemes are valuable, we believe there are several important advantages for our approach, which has been relatively under-explored. First, conventional imaging optics and focal plane arrays are highly optimized, and it is difficult to improve on their baseline performance, particularly in wavelength regimes where high-resolution focal plane arrays are available. As a result, our approach does not attempt to alter the original image formation process. Instead, we designed an accelerator to address the two main bottlenecks limiting persistent, high-data-rate image acquisition: power consumption and compression speed. Second, by positioning the accelerator after the image formation and initial optical-to-electrical conversion step, this scheme is immediately compatible with any image acquisition system, regardless of operating wavelength, camera resolution, front-end optics, frame rate, or application. There is no added insertion loss before the initial detection stage, enabling compatibility with low-light and high-speed imaging applications. Perhaps most significantly, our approach works with either ambient illumination imaging or active illumination, whereas many of the compressive imaging schemes that operate at the image acquisition stage require active illumination with a coherent source^7,8, which severely limits the application space.

Instead of modifying the original image formation process, our approach builds on the neural network framework proposed in the blind compression work¹, which allows the key front-end image processing task to be accomplished using a single matrix-vector multiplication. Fortunately, analog optical computing is particularly well-suited for this type of operation and is being explored for a variety of matrix-multiplication-heavy computing applications^12,13,14,15. The key advantage is that optical computing engines can perform matrix multiplication with energy consumption that scales linearly with the dimension of the input dataset (N) as opposed to the quadratic scaling (N²) inherent to electronic approaches. In addition, optical computing engines are able to process N pixels in parallel with an overall speed that can exceed 100 GHz, only limited by the speed of the optical modulators and photodetectors used to encode the input data and record the result^16,17,18.

In this work, we show that an optical image processing engine can take advantage of these unique features to enable high-speed image compression with the potential for orders-of-magnitude lower power consumption than current techniques. Our approach is based on a passive, CMOS-compatible silicon photonics device that performs the matrix-vector multiplication required for front-end image processing. We experimentally demonstrate image compression with a ratio of 1:4 and develop a back-end neural network capable of reconstructing the original images with an average peak signal-to-noise ratio (PSNR) ~ 25 dB and structural similarity index measure (SSIM) ~ 0.9, comparable to the common electronic software-based lossy compression schemes such as JPEG¹⁹. By processing modest-sized kernels in series, our approach is inherently scalable and could process high-frame rate, limited region-of-interest image data, or Gigapixel images at slower data rates. Finally, by constructing the optical image compression engine on a silicon photonics platform, this approach meets the size-weight-and-power and integration requirements for a wide range of applications including surveillance, microscopy, machine vision, astronomy, or remote sensing. Analysis of the throughput and power consumption for our optical image processing engine indicates that this technique has the potential to encode Terapixel/second data streams utilizing <100 femtojoules per pixel—representing a > 1000x reduction in power consumption compared with the state-of-the-art electronic approaches. This improvement in throughput and power consumption is made possible by parallelization of pixel processing and transferring the majority of the image processing and conditioning tasks (e.g., denoising, linearization, etc.) from the front-end encoding interface to the back-end decoding interface. This trade-off is particularly attractive for remote sensing and imaging applications in which image reconstruction is only performed on a need-to-know basis and is usually conducted at a remote cloud site with better access to power.

Results

Operating principle of the image encoder

The optical image compression scheme relies on an auto-encoder neural network framework in which the compressed image is naturally formed at the “bottleneck” layer in a neural network²⁰. This approach has gained traction in recent years due to its ability to simultaneously perform data compression, dimensionality reduction, and denoising^20,21,22. In our implementation, the first half of the neural network (mapping the original image data to the compressed image at the bottleneck layer) is performed optically, while the second half of the network (reconstructing the image) is performed using digital electronics. A schematic of the optical encoder and corresponding neural network structure is shown in Fig. 1. Our hybrid optoelectronic auto-encoder takes advantage of the fact that most neural encoding networks are not very sensitive to the details of the feature map (i.e., the weights and connections) implemented by the first few layers. In fact, researchers have shown that the first few layers can often be assigned random weights without compromising performance²³. This allows us to use a pre-designed, passive photonic device to perform the transform used in the first layer of the auto-encoder network. In this case, we designed the photonic layer to perform local kernel-like random transforms on small blocks of the image at a time. This random encoding scheme was selected based on compressive measurement theory, which has shown that random transforms are ideal for a variety of dimensionality reduction and compression tasks^{24,25,26,27,28}. However, unlike conventional compressed sensing measurements, which are applied at the data acquisition stage¹¹, we are addressing the problem of dimensionality reduction and data compression after image formation.

**Fig. 1: Working principle of the photonic image encoder.**

As shown in Fig. 1, the silicon photonics-based image encoder consists of N single mode input waveguides, each with a dedicated modulator, followed by a multimode waveguide region, a random encoding layer, and M photodetectors. A laser (not shown) provides light with equal amplitude to the N input waveguides where the N modulators encode a $\sqrt{N}\times \sqrt{N}$ pixel block of the input image onto the amplitude of light transmitted through each waveguide. Light from each waveguide is then coupled into a multimode waveguide region before scattering through the random encoding layer and finally reaching the photodetectors. The random encoding layer consists of a series of randomly positioned scattering centers fabricated by etching air holes in the silicon waveguiding layer (see the “Methods” section for more details). Since the optical device operates in the linear regime, we can describe the encoding process using a single transmission matrix (T), which relates the input (I) to the transmitted output (O) as O = TI, where I is a N × 1 vector, O is a M × 1 vector, and T is a $M\times N$ matrix. By forcing M to be less than N, the device effectively performs a single matrix multiplication in order to compress an N pixel block of the original image into M output pixels. Since the random encoding layer is entirely passive, this compression process can be extremely fast, operating on N pixels in parallel at speeds limited only by the modulators and photodetectors. In addition, the energy consumption scales linearly with N (i.e., the number of modulators), even though the device performs $M\times N$ operations.

To further clarify the role of our data compression system, we can consider the entire image acquisition/compression process in 4 steps:

(1)
Conventional imaging optics form an image on the focal plane array of the camera.
(2)
Conventional focal plane array detectors convert the analog optical image to the electrical domain.

At this point we have two options:

(3a) Most commercially available cameras are designed to digitize the image data recorded on the focal plane array. Using this type of camera, we would then use a digital to analog converter (DAC) to drive the optical modulators on-chip, re-encoding the image information in the optical domain on an optical carrier.

(3b) Focal plane arrays are also commercially available, which provide a direct analog output²⁹. This analog output could be used to directly drive the optical modulators to re-encode the image information without an intermediate digitization step. A transimpedance amplifier (TIA) can directly convert the analog photocurrent to voltage to drive the re-encoding modulation, with appropriate amplification.

(4) The on-chip photonic encoder then performs high-speed, low power compression and the output from the detectors on-chip is digitized and stored for off-line image reconstruction.

Using focal plane arrays that provide an analog output (option 3b) has the potential to significantly reduce the overall power consumption and throughput by avoiding the intermediate analog to digital conversion (ADC)/DAC steps. However, our underlying approach (the photonic chip performing compression) is compatible with both approaches, which is useful given the ubiquity of cameras with integrated ADCs.

The local kernel size, N, is a key parameter driving the performance of the photonic image processing engine. While using smaller kernel-like transforms reduces the data throughput, since the device can now only compress N pixels at a time, smaller kernels also have several advantages. First, local transforms maintain the spatial structure of the original image, which tends to improve the image reconstruction, as we discuss in the next section. Second, the kernel approach can be used to compress arbitrarily large images without requiring a corresponding increase in the numbers of modulators and detectors. Third, using these local transforms helps to isolate noise from a given pixel (e.g., a hot pixel), which could otherwise spread across the entire compressed image. Finally, since this compression scheme effectively maps the input image blocks to speckle patterns, using a large kernel could lead to low-contrast speckles which can also degrade the image reconstruction similar to the trend observed in speckle-based spectrometers³⁰.

Effect of kernel size and kernel type on image compressibility

To determine the effect of kernel size on image compression, we performed numerical simulations of the image compression and reconstruction process using images taken from the DIV2K and Flickr2K dataset^31,32 and synthetically generated random T matrices. To reduce the computation time, the images were converted to grayscale and cropped to a resolution of 512 $\times$ 512 pixels. The dataset consisted of 4152 images divided into a 3650-image training dataset and a 502-image validation dataset. In this case, the compression process was simulated by multiplying each $\sqrt{N}\times \sqrt{N}$ block of an image by a numerically generated random matrix consisting of real, positive numbers uniformly distributed between 0 and 1. We then trained a neural network to reconstruct the original image from the compressed image (see “Methods” for a detailed description of the neural network architecture and training routine). Finally, we used the test images from DIV2K and Flickr2K dataset to evaluate the reconstructed image fidelity after compression using kernels of varying sizes.

An example image from the set of test images is shown in Fig. 2a along with the compressed images obtained using 8$\times$8 (Fig. 2b) and 32$\times$32-pixel kernels (Fig. 2e). In this case, we fixed the compression ratio $(M:N)$ at 1:8 and the images were compressed from 512$\times$512 to 8$\times \left(64\times 64\right)$ or $128\times \left(16\times 16\right)$ pixel datacubes. The reconstructed images using the two kernel sizes are shown in Fig. 2c, f. Using a smaller kernel size clearly retains more of the spatial structure in the compressed image (Fig. 2b), resulting in a higher fidelity reconstruction. The average peak signal-to-noise ratio (PSNR) and the structural similarity index measure (SSIM) of the reconstructed images in the test image dataset are shown in Fig. 2d as a function of kernel size. We found that smaller kernels generally result in higher-quality image reconstruction. Unlike spatially uncorrelated and sparse data—which can be efficiently compressed using large random matrices—image data naturally includes spatial structure and spatial correlations which are important to maintain. In general, the optimal kernel size will depend on the type of images being compressed and will be impacted by factors such as the sparsity and spatial frequency content of the images.

**Fig. 2: Effect of kernel size on image compression and reconstruction using images from the DIV2K and Flickr2K dataset^31,32.**

For benchmarking, we compared our compression technique using an 8$\times$8 kernel and 16$\times$16 kernel to standard JPEG compression (which uses an 8$\times$8 kernel) on images from the same test dataset (DIV2k and Flickr2K). Figure 3 (a, b) compares the average PSNR (a) and SSIM (b) of the test image dataset obtained using JPEG compression to our encoding scheme as a function of compression ratios. As shown in Fig. 3a, b, our approach provides slightly lower PSNR/SSIM than JPEG at low compression ratios (e.g., <1:16) and comparable PSNR/SSIM at intermediate ratios (1:32 and 1:64). Our approach also enables higher compression ratios than is possible using JPEG (which is limited to a ratio of 1:64) and maintains an acceptable average PSNR > 20 dB up to ratios of 1:256. Furthermore, unlike photonic compression which allowed for fixed compression ratios, not all images from the test dataset could be compressed to the ratio of 1:64 using JPEG compression (only ~400 out of 500 test images could be compressed up to 1:64). Figure 3c, d shows the compression ratio dependence of the same example image as in Fig. 2a reconstructed after compression using the photonic approach and using JPEG. For this image, the highest compression ratio that we could achieve using the JPEG algorithm is 1:45 (see Supplementary Information: Section S6. Comparison of JPEG and photonic compression for more examples). At ratios of 1:8 and 1:16, both techniques provide excellent quality images. At higher compression ratios (compression ratios > 1:32), JPEG compression introduces pixilation artifacts, whereas the photonic compression scheme loses some of the higher spatial frequency content. These differences result from fundamental distinctions between the two compression schemes.

**Fig. 3: Comparison of photonic image compression to digital JPEG image compression using images from the DIV2K and Flickr2K dataset^31,32.**

The JPEG compression algorithm applies a discrete cosine transform (DCT) to every 8 × 8-pixel block in an image¹⁹. It then applies a thresholding operation to store the most significant basis functions. While this nonlinear, data-dependent transformation approach results in high-quality compression for low compression ratios or for sparse images, it has several drawbacks compared to the photonic compression scheme introduced here, particularly for high data-rate imaging applications. First, JPEG compression ratios are image-dependent. This can be problematic for high-data-rate image acquisition, which would have to allocate variable-sized memory blocks as the compression ratio changes. Second, JPEG performs many more operations on each pixel than our photonic scheme, since it performs a full DCT transformations on every 8 × 8 pixel block before selecting the basis functions to retain. This results in increased power consumption and slower throughput since it requires multiple clock-cycles. Finally, JPEG is unable to perform de-noising or image conditioning. As described in the Experimental section below, the photonic scheme is able to simultaneously perform image compression, image conditioning (pixel linearity, hot pixels, etc.), and denoising by using a neural-network based non-linear decoding scheme.

In addition to kernel size, we also investigated two types of random kernels. In Figs. 2 and 3, we presented simulations using synthesized random T matrices that were real and positive. This simulated the compression process if light coupled to each of the input waveguides was effectively incoherent. For example, a frequency comb or other multi-wavelength source could be used to couple light at different wavelengths into each waveguide (to be more quantitative, light in each waveguide should be separated in frequency by at least ~10X the detector bandwidth to minimize interference effects)³³. In this case, the speckle patterns formed by light from each waveguide would sum incoherently on the detectors and the compression process can be modeled using a random T matrix that is real-valued and non-negative. The second case we considered is using a complex-valued field T matrix, in which each element in T was assigned a random amplitude and phase. In this case, the compressed image was obtained as the square-law detector response: $O=\left(T\sqrt{I}\right){\left(T\sqrt{I}\right)}^{*}$. This case simulates the effect of coupling a single, coherent laser to all the input waveguides at once such that the measured speckle pattern is formed by interference between light from each waveguide.

To evaluate the trade-off between real and complex transforms, we evaluated the reconstructed image quality at varying noise levels. In general, noise could be introduced in the original image formation process (e.g., due to low-light levels or imperfections in the imaging optics), through the camera opto-electronic conversion process (e.g., due to pixel non-linearity or the limited bit depth of the camera pixels), or through the optical compression process described in this work (e.g., due to laser intensity noise, environmental variations in the T matrix, or simply shot noise at the detection stage). To simulate the effect of noise on the reconstruction of the compressed images, we numerically added gaussian white noise to the compressed images. Figure 4a, d shows the same test image evaluated in Fig. 2, compressed by a factor of 8X (compression ratio 1:8) using either a real-or complex-valued 8 × 8 pixel T matrix. In this case, gaussian white noise with an amplitude equal to 2% of the average signal level in the image (corresponding to an SNR of 50) was added to each compressed image. The reconstructed images using the real and complex $T$ matrices are shown in Fig. 4b, e. At 2 % noise level (SNR = 50), the reconstructed images are only marginally worse than the reconstructed image without noise shown in Fig. 2c (PSNR = 25.1 dB with noise vs PSNR = 26.9 dB without noise for the case of a real transform). This measurement confirms that the autoencoder framework is relatively resilient to noise, which is consistent with prior applications of autoencoders for denoising tasks. This resilience could also enable the system to forego the energy-intensive image conditioning by encoding raw image data and relying on the back-end neural network to compensate for noise due to effects such as pixel non-uniformity. In Fig. 4c, f, we present the average PSNR and SSIM for reconstructed test images that were compressed using either real valued or complex valued T matrices, as a function of the SNR of the compressed images. These simulations showed that at relatively high SNR ( > 50), the real and complex valued T matrices provided comparable performance. However, at lower SNR, the complex valued T matrices provided more robust image compression due to the higher contrast in the compressed images obtained using a complex $T$ matrix.

**Fig. 4: Effect of kernel type (real vs complex) on image compression and reconstruction using images from the DIV2K and Flickr2K dataset^31,32.**

Experimental image compression and denoising

Next, we performed experiments to validate the following key predictions:

(1)
To confirm that our proposed approach (using an analog photonics-based fixed linear random matrix for compression and a non-linear neural network for de-compression) could provide comparable quality image compression to JPEG (which uses an image-dependent compression scheme which is far more energy and time-intensive).
(2)
To confirm that this technique could be used for both de-noising and compression.

For experimental validation of our image processing approach, we fabricated a prototype device on a silicon photonics platform. The experimental device included $N$=16 single-mode input waveguides connected to the scattering layer through a multimode waveguide. The multimode waveguide region allowed light from each single-mode waveguide to spread out along the transverse axis before reaching the random scattering layer. This ensured that we obtained a uniformly distributed random transmission matrix without requiring an excessively long random scattering medium, which would introduce excess loss through out-of-plane scattering. To illustrate the impact of the multimode waveguide, we performed full-wave numerical simulations of single-mode waveguides either connected directly to the scattering layer or connected through an intermediate multimode waveguide region. In the first case, shown in Fig. 5a, the scattering layer was not thick enough for light to fully diffuse along the transverse axis, resulting in a high concentration of transmitted light near the position of the input waveguide. In terms of a ${T}$ matrix, this resulted in stronger coefficients along the diagonal rather than the desired, uniformly distributed random matrix. We then simulated the effect of adding a 32 μm multimode waveguide between the single-mode input waveguide and the scattering layer. As shown in Fig. 5(b), the multimode waveguide allowed light from the single-mode waveguide to extend across the scattering layer, resulting in a transmitted speckle pattern that was uniformly distributed.

**Fig. 5: Numerical simulations and experimental characterization.**

The device was fabricated using a standard silicon-on-insulator wafer with a 250 nm thick silicon layer. The fabricated device consisted of 16 single-mode input waveguides connecting the device to the edge of the chip. The waveguides were 450 nm wide and separated by 3 $\mu m$ (corresponding to a spacing of ~2$\lambda$ at a wavelength of 1550 nm to minimize evanescent coupling). All 16 waveguides were connected to a 55.2 $\mu m$ wide, 120 $\mu m$ long multimode waveguide region, followed by a 30 $\mu m$ long scattering region. The scattering region consisted of randomly placed 50 nm radius cylinders with a 3% filling fraction etched in the silicon waveguiding layer. The scattering layer parameters were empirically optimized to achieve a transmission of ~30% [see Supplementary Information: Section S5. Additional Experimental Characterization for experimental results]. To minimize leakage of light at the edges of the scattering layer, we added a full band-gap photonic crystal layer on the sides of the scattering layer^34,35,36. We experimentally confirmed that transmission through the device was ~30%, as described in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization]. Since this initial prototype did not include integrated photodetectors, we etched a ridge in the silicon waveguiding layer after the scattering region. This allowed us to record the light scattered out-of-plane from this ridge to measure the optical power which would be recorded if detectors were integrated in the device. Scanning electron microscope images of the fabricated device are shown in Fig. 5c.

We note here that the design goals of the random scattering medium for this application are significantly different compared with prior applications of on-chip scattering media such as the speckle spectrometer reported in Ref. ³⁶. While a speckle spectrometer relies on obtaining distinct random projections for different input wavelengths, necessitating a relatively large, strongly-scattering region, here we desire a scattering medium with a broadband response (to minimize temperature dependence), but distinct, uniformly distributed random projections for different spatial modes. The compression device should also be designed to minimize the footprint and loss while still providing the type of fully distributed random transmission matrix required for high-quality image compression. As shown in Fig. 5 a, b, we found that a multimode waveguide region followed by a short scattering region allowed us to achieve this combination since it allowed each spatial input to overlap before reaching the scattering medium. This provided a uniformly distributed transmission matrix without requiring a large scattering region which would add significant loss.

To test the device, we first measured the $T$ matrix by coupling an input laser operating at a wavelength of 1550 nm into each single mode waveguide and recording the speckle pattern scattered from the detection ridge after the scattering layer using an optical microscope setup. A typical image recorded using the optical setup is shown in Fig. 5d.

In order to account for experimental noise in the image compression and recovery process, we recorded two sequential $T$ matrices, as shown in Fig. 6a, b. The ${T}$ matrix was highly repeatable, as revealed in Fig. 6c, which shows the difference between the two measurements. A histogram of the difference in the matrices, shown in Fig. 6d, indicates a gaussian-like random noise with amplitude ~1% of the average signal value (corresponding to a measurement SNR ~ 100). As shown in Fig. 4, at this SNR, both real and complex transformations provide similar results in terms of image reconstruction. This implies that we can use the experimentally measured intensity transmission matrix for image compression.

**Fig. 6: Experimental demonstration of denoising and image compression using images from the DIV2K and Flickr2K dataset^31,32.**

We note here that the experimental noise is due to noise generated by the laser and the electronics as expected for a real application. The photonic encoder, on the other hand, is extremely stable and provides a highly repeatable response. To confirm this, we monitored the transmission matrix for 60 h and the results are shown in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization]. We found that the device is very stable with negligible fluctuations for the span of 60 h without requiring active temperature stabilization. This level of stability for an integrated photonic device is not surprising given the short lifetime of light passing through the scattering region, corresponding to a low effective quality factor with minimal temperature-dependence. Regarding temperature-dependence, based on our previous work³⁶ and assuming a thermo-optic coefficient in Si of dn/dT ≈ 1.8 × 10⁻⁴K⁻¹ ³⁷, the generated speckle pattern at the output will stay correlated for temperature changes up to ± 4^oK. This stability is one of the advantages of the compact scattering device structure used in this work. Moreover, as shown later, using our unique approach of combining image compression with denoising, some of the noise introduced by thermal fluctuations during image compression could potentially be suppressed by training the back-end image reconstruction neural network using data acquired at a range of temperatures.

To convert the raw measured transmission matrix into the $T$ matrix used for compression, we selected 4 non-overlapping spatial regions along the output ridge shown in Fig. 5d. This corresponds to selecting 4 columns of the matrix shown in Fig. 6a. This updated $T$ matrix had dimensions of $16\times 4$, providing a compression factor of 1:4. We then used this experimental matrix to train the back-end neural network required to reconstruct the original image. Note that we included noise in the training process by adding gaussian noise with the same 1% variance measured experimentally. Finally, we compressed the test images in the DIV2K and Flickr2K dataset while again adding random noise with 1% variance. A typical compressed image using the experimentally measured ${T}$ matrix is shown in Fig. 6e and the corresponding reconstructed image is shown in Fig. 6f. Excellent agreement between the original and reconstructed images can be seen with PSNR = 26.02 dB and SSIM = 0.91. In comparison, 1:4 JPEG compression of the same image and with similar SNR gives PSNR = 29.63 dB and SSIM = 0.83. We repeated this process for the entire set of test images and obtained an average PSNR of $26\pm 4$ dB and SSIM of $0.9\pm 0.07.$ Additional examples of compressed and reconstructed images along with comparisons to JPEG compression are shown in the Supplementary Information [Supplementary Information: Section S1. Additional Experimental results: Compressed and reconstructed images and their statistics and Section S6. Comparison of Digital JPEG and Photonic Compression].

Finally, in addition to showing that this approach is robust to noise introduced in the analog photonic image compression step, this demonstration also illustrates how this technique could be used for image denoising. From the perspective of the back-end image reconstruction neural network, noise added during the original image acquisition process (e.g., due to pixel noise, non-uniform responsivity, or simply low light levels) is equivalent to noise added during the image compression step (as tested explicitly here using experimental data for noise). Thus, this work also highlights the potential for this technique to move the energy-intensive image conditioning and denoising steps to the back-end image reconstruction stage.

While this initial demonstration was performed at low speed because of the lack of integrated detectors and modulators, this approach is compatible with high-speed operation (e.g., >10 GHz). Our compression device can be considered as a photonic communication link where data is encoded on an optical carrier using the input modulators, transmitted through the scattering region (analogous to transmission along a bus waveguide or through a fiber in a communications link) and is recorded on a high-speed photodetector. Since the loss through the scattering region of 30% is relatively low, we expect the compression device to operate with comparable SNR to photonic links operating at similar speeds. One potential complication is that noise will be added during the data encoding step (i.e., on the high-speed integrated modulators). However, our approach can allow us to compress and denoise simultaneously, and the image compression process works well even for SNR as low as 10 (as shown in Fig. 4f). In addition, we performed simulations where we calculated the compression quality as a function of noise added to the input image (simulating the effect of noise introduced by the modulators while encoding the inputs). The results are shown in the Supplementary Information (S4: Denoising Images), and they confirm that our approach is quite robust to noise introduced by the input modulators.

Predicted energy consumption and operating speed for the photonic image processor

As described above, our encoding and compression technique can be reduced to a matrix multiplication operation. In order to compare the power consumption using our photonic approach with a traditional electronic scheme, we estimated the energy per multiply-accumulate (MAC) operation using both approaches. Electronic hardware accelerators have been thoroughly optimized to reduce the power consumption per MAC.

The power consumed by the photonic image processing engine includes contributions from the power consumed by the laser, the optical modulators, and the photodetectors. To estimate the required laser power, we first estimated the required detected power to provide sufficient signal-to-noise for accurate image compression. Assuming shot-noise limited detection, we can express the required optical power reaching each photodetector as³⁸:

$${P}_{{Rx}}={2}^{2{ENOB}}q{f}_{0}/{{{{{\mathscr{R}}}}}}$$

(1)

where ENOB is the required effective number of bits, $q$ is the charge of a single electron (1.6 × 10−¹⁹ coulombs), ${f}_{0}$, is the operating frequency of the modulator (and the detector baud rate), and ${{{{{\mathscr{R}}}}}}$ is the responsivity of the photodetector in units of $A/W$. The ENOB can be related to the measurement SNR in dB as ${SNR}=6.02\times {ENOB}+1.72$³⁸$.$ In the energy consumption calculations below, we assumed a required ${ENOB}$ of 6, corresponding to a measurement SNR of 38 dB, which provides significant margin compared with the experimentally measured SNR of 17 dB. Based on the required power at the detector, we can work backwards to estimate the required laser power as

$${P}_{{laser}}=\frac{N\times {P}_{{Rx}}}{{T}_{{{{{\mathrm{mod}}}}}}{T}_{{scatter}}}$$

(2)

where $N$ is the number of pixels in an image block, ${T}_{{{{{\mathrm{mod}}}}}}$ is the transmission through the optical modulators, and ${T}_{{scatter}}$ is the transmission through the scattering medium. The electrical power required to drive the laser can then be written as ${P}_{{laser}}/\eta$, where $\eta$ is the wall-plug efficiency of the laser. The factor of $N$ in Eq. (2) implies that the multimode waveguide and scattering region support $N$ spatial modes (the minimum required to efficiently couple light from $N$ single-mode input waveguides) and each detector collects, on average, $1/N$ of the light transmitted through the scattering medium. In our preliminary experiment, presented above, a slightly larger multimode waveguide than required was used to simplify the experiment. As a result, the optical power was distributed over more than $N$ modes in our initial demonstration. In the future, adiabatically coupling the single-mode input waveguides into an $N$-mode multimode waveguide would optimize the power efficiency.

The power required by the optical modulators can be expressed as³⁹

$${P}_{{Mod}}=\frac{1}{2}{C}_{{Mod}}{V}_{{pp}}^{2}{f}_{0}$$

(3)

where ${C}_{{Mod}}$ is the capacitance and ${V}_{{pp}}$ is the peak-to-peak driving voltage of the modulator. The power required by the photodetectors can be approximated as

$${P}_{{PD}}\approx {V}_{{bias}}{{{{{\mathscr{R}}}}}}{P}_{0}$$

(4)

where ${V}_{{bias}}$ is the bias voltage of PN junction. The total electrical power consumed by the photonic image-processing engine can then be calculated as

$${P}_{{total}}={P}_{{laser}}/\eta+N\times {P}_{{{{{\mathrm{mod}}}}}}+M\times {P}_{{PD}}$$

(5)

Since the total number of MACs per second is $N\times M\times {f}_{0}$, the energy consumption per MAC is given by ${P}_{{total}}/\left(N\times M\times {f}_{0}\right)$. After substituting Eq. 1 into the expressions for ${P}_{{laser}}({{{{{\rm{Eq}}}}}}.2)$ and ${P}_{{PD}}$ (Eq. (4)), we see that the total energy consumption per MAC is independent of the modulation frequency.

To quantitatively compare the energy per MAC required by an optimized photonic processing engine with a conventional electronic GPU, we assumed typical specifications for the opto-electronic components. ${C}_{{Mod}}$ is usually on the order of 1fF, ${V}_{{pp}}$ is ~1 V, ${V}_{{bias}}$ is typically 3.3 V, and ${{{{{\mathscr{R}}}}}}$ is typically ~1 mA/mW at a wavelength of 1550 nm^40,41. In addition, typical insertion loss for high-speed optical modulators is ~ 6.4 dB (${T}_{{{{{\mathrm{mod}}}}}}=0.27$) and the wall-plug efficiency for distributed feedback lasers is $\eta=0.2$^42,43$.$ The transmission through the experimental scattering medium is assumed to be ${T}_{{scatter}}=0.2$ which also takes into account coupling efficiency to the integrated photodetectors. To be specific, our scattering medium provides a transmission of 30% as shown in the Supplementary Information [Supplementary Information: Section S5. Additional Experimental Characterization]. However, we assumed 20% overall transmission, which included ~67% transmission to the photodetectors (see Supplementary Information: Section S3. Integration of photonic encoder with silicon photonics and CMOS components for more details).

The estimated energy consumption per MAC as a function of the image block size $N$ is shown in Fig. 7. The energy required by the photonic image processor decreases rapidly with image block size. We also find that the majority of the power consumption is driven by the laser and ${P}_{{laser}}$ is 9.2 mW for a kernel size of $8\times 8$ pixels and an ENOB of 6. While this power level is readily available from most commercial lasers operating in this wavelength regime (near 1550 nm), lower power consumption could be achieved if a lower ${ENOB}$ was sufficient for a given application, which would enable a lower power laser (see Fig. 4c, f for an analysis of the trade-off between image reconstruction quality and the SNR of the compressed image). Nonetheless, we find that for an image block size of $8\times 8$ pixels, the photonic image processor has the potential to provide 100X lower power consumption than a typical GPU. Although the photonic processor is even more efficient when using larger image blocks; this can degrade the image reconstruction, as shown in Fig. 2(d). In the future, alternative inverse-designed transforms could enable large pixel blocks without sacrificing image reconstruction fidelity.

**Fig. 7: Comparison of energy consumption for electronic and all-optical encoding approaches for image compression.**

We can also use this framework to estimate the energy consumption per pixel, which is calculated as ${P}_{{total}}/\left(N\times {f}_{0}\right).$The energy per pixel is independent of both modulation frequency and the size of the pixel blocks, and for an ${ENOB}$ of 6 and the parameters listed above, can be as low as 72 fJ. This is dramatically lower than the ~0.1 $\mu$J used in existing image processing systems; however, the latter also includes the power required to operate the pixels and the analog-to-digital conversion process used to extract the signal recorded by each pixel. Nevertheless, since more than 50% of the energy consumed by standard electronic image processing systems is dedicated to image compression and conditioning, our opto-electronic approach can contribute significantly to reducing the overall energy consumption.

Finally, the device throughput, in terms of pixels/second, can be estimated as $N\times {f}_{0}$. Assuming an image block size of $8\times 8$ ($N=64$), this approach can process a Terapixel/second of image data using a clock speed of ~16 GHz, which is readily achieved using standard optical modulators and photodetectors^16,17,18. While such compression would require significant temporal multiplexing to provide the compression engine with 64-pixel kernels at a rate of 16 GHz, fortunately, current on-board memory has the capability to store as much as 1 Gpixel of data in a buffer to feed the compression engine for real-time processing. Since the compression engine can process 1Tpixel/sec (or 1 Gpixel/msec), this is sufficient to keep up with the data rate acquired by a Gpixel camera operating at 1 kHz (well beyond current state-of-the-art). This speed is far beyond the compression speeds of conventional state-of-the-art digital electronics schemes such as JPEG compression which can compress at most around a Gigapixel/sec.

While the analysis presented above compares the energy consumption of our photonic approach to GPUs implementing the same random transform-based compression algorithm, another important comparison is the energy required for JPEG compression using a variety of processors that might be found on a digital camera or smart phone. To calculate the energy consumption for JPEG compression¹⁹, first, we analyzed the number of operations required by the JPEG algorithm. The original JPEG compression algorithm performs a discrete cosine transform (DCT) followed by thresholding (to store the most significant basis functions) for every 8 × 8, non-overlapping kernel of pixels. To encode an image with $N\times N$ pixels, ${N}^{2}/64$ DCTs, each requiring 64 × 64 multiply and accumulate operations (MACs), are necessary, totaling 64${N}^{2}$ MACs. The more recent JPEG 2000 compression algorithm⁴⁴ decomposes the image into wavelet representations by successively passing it through 2D $n$-tap filters and applying 2 × 2 down sampling. For an image with $N\times N$ pixels undergoing $K$-level decompositions, the total number of MACs is $4n\mathop{\sum }\nolimits_{j=0}^{K-1}N/{2}^{j}$. The computing complexity of both JPEG and JPEG 2000 scales with $O({N}^{2})$. This is the same computing complexity that our algorithm would require if you implemented the random projections using digital electronics rather than analog photonics.

Modern digital cameras or smart phones use an embedded system-on-chip (SoC) as the core processing unit. These SoCs typically include ARM cores for general-purpose computing, memory interface, and device control, along with specifically designed hardware codecs (e.g., JPEG/HEIF encoder in Canon DIGIC) or image-processing ASIC (e.g., Apple A16 Bionic) for JPEG compression and decoding. The power consumption of these codecs or ASICs can range from 0.5 to 20 pJ/MAC⁴⁵. Because digital hardware codecs are designed for streaming pipelined, sequential inputs of image blocks, the total power consumption also grows as $O({N}^{2})$. Therefore, the power consumption per MAC remains constant regardless of the image size, sharing a similar scaling trend as GPUs. Moreover, the energy/MAC for a GPU ( ~ 1 pJ/MAC) is comparable to that of SOCs and ASICs and is plotted in Fig. 7.

Thus, to conclude, the power consumption for JPEG compression on a smartphone or digital camera is comparable to a GPU and exhibits the same scaling. The advantage of using a GPU for compression is that it might enable higher throughput by leveraging parallel processing. After this analysis, it is clear that the photonic compression engine still outperforms both in terms of speed and energy consumption with the potential for orders-of-magnitude lower power consumption than digital electronics solutions based on GPUs, ASICs or SOCs. Table 1 in the Supplementary Information (Section S2. Energy consumption typical of mainstream electronic architectures) summarizes the energy consumption typical of mainstream electronic architectures, including desktop processors with the most efficient schemes reaching ~1 pJ/MAC.

Discussion

In summary, we proposed a CMOS-compatible silicon photonics-based approach for large-scale image processing. Our approach performs image compression and de-noising using an auto-encoder framework in which the first layer of the network is implemented using analog photonics, while the back-end image reconstruction is implemented using digital electronics. Our approach enables image compression with comparable quality to standard digital compression techniques such as JPEG and image denoising quality comparable to state-of-the-art digital neural network based de-noising algorithms. In contrast to the prevailing image processing approach, which performs a large number of image conditioning tasks at the front end, our approach is designed to compress the raw image data and use the neural network back-end to both reconstruct the original image and perform the image denoising and conditioning operations.

In this work, we presented a combination of numerical simulations and experimental characterization of a proof-of-principle device to validate this approach. Using numerical simulations, we optimized the system design by evaluating the compression quality as a function of kernel size and transmission matrix type (positive, real-valued vs. complex-valued). These simulations confirmed that this scheme can provide comparable quality compression to the JPEG algorithm. We also evaluated the impact of noise on the image compression quality and demonstrated the ability of this scheme to perform de-noising. Finally, we analyzed the potential for this scheme to perform high-throughput processing with low power consumption. For example, by processing 8 × 8 blocks of pixels (i.e., 64 inputs) in parallel at ~16 GHz, this approach could process 1 Terapixel/s. This would enable large format image sensors such as Gigapixel cameras to operate at speeds up to 1 kHz or standard Megapixel imaging systems to operate at frame rates up to 1 MHz. From a power-consumption perspective, we found that an optimized photonic encoder would consume 100X less energy per MAC than a GPU. In addition to the numerical simulations and analysis, we experimentally characterized a passive silicon photonic prototype with 16 inputs designed to process 4 × 4 pixel kernels and demonstrated compression using a real-valued experimental transmission (T) matrix. This confirmed that the properties of the experimental transmission matrix enable image compression with similar quality to the digital JPEG algorithm and image denoising with similar quality to digital electronic neural network based denoising algorithms. This experimental characterization also confirmed that this approach is robust to fabrication imperfections, provided calibration is performed after fabrication. Future work will focus on integration of high-speed modulators and detectors and increasing the kernel size to support 8 × 8 pixel blocks.

While this work focused on compression and denoising of grayscale images, the general approach could be used to compress red-green-blue (RGB) images, hyperspectral, or time-series image data. Finally, this general scheme is amenable to a variety of imaging processing tasks other than compression including inference or classification. Again, the analog photonic transform could form the first layer of a neural network, accelerating the initial time and energy-intensive processing of high-dimensional image data, while relying on back-end digital electronics to complete the network. By tailoring this back-end neural network, the same photonic image processing engine could be applied to a variety of image processing and remote sensing applications.

Although we utilized a random scattering layer to generate the random encoding in this work, alternative approaches may be advantageous in some applications. For example, multimode waveguides^46,47,48 or chaotic cavities⁴⁹ could also be used to perform random encoding for compression and have the potential for lower loss due to reduced out-of-plane scattering.

Finally, although we focused on using a linear encoding of the raw image data for compression, nonlinear, multi-layer encodings could potentially enable higher compression ratios. One trade-off is that non-linear encoding schemes are likely to require higher power consumption (due to optical attenuation and/or the power required to drive the non-linear process). Nonetheless, the potential for improved compression ratios and the ability to process larger kernel sizes at once makes this an intriguing approach which we hope to investigate in future studies.

Methods

Neural processing algorithm at the backend

All the images tested were from the DIV2K and Flickr2K dataset. A dataset of 4152 grayscale images was generated, each with a resolution of 512 × 512 pixels, through cropping and grayscale conversion. The dataset was divided into a training set of 3650 images and a validation set of 502 images. To study the image compression process, numerically generated random transmission matrices as well as experimentally measured transmission matrices were utilized as the encoding matrix. By using the original images as the ground truth and the compressed measurements as input, a convolutional neural network (CNN) was successfully trained to establish a correlation between the ground truth and the compressed images. The neural networks used were constructed based on Deep ResUnet⁵⁰ and ResUNet + +⁵¹. In order to investigate the impact of different compressive kernel sizes on the networks’ ability to reconstruct compressed images, four different kernel sizes were explored (Fig. 2). The network architecture for the 4 × 4 kernel size consisted of 1 initial layer, 11 residual blocks, 2 down sampling layers, 4 up sampling layers, and 1 final convolutional layer. The residual block architecture was based on the residual neural network⁵², and each block consisted of a Conv(3 × 3)-BN-LeakyReLU-Conv(3 × 3)- BN-[Conv(1 × 1)]-LeakyReLU block, with a Conv(1 × 1) layer added in the residual connection (marked with brackets). The downsampling (upsampling) rate in each downsampling (upsampling) layer was 2. The downsampling layer comprised of a Conv(3 × 3)-BN block, where the convolutional layer had padding =1 and stride = 2. The upsampling layer used ConvTranspose2d with kernel size 2 × 2 and stride=2. The initial layer was a Conv(3 × 3)-BN-LeakyReLU-Conv(3 × 3)-Conv(3 × 3) layer, which produced 64 feature maps, and the final convolutional layer had kernel size 1 × 1. For the networks with other kernel sizes, we added additional upsampling layers and residual blocks to maintain the number of layers and the size of the final output tensor. The network was trained with the mean squared error loss, Xavier initialization⁵³, Adam optimizer⁵⁴ with learning decay rate 0.1 per 400 epochs, and initial learning rate 0.001 for 800 epochs in PyTorch. The training was performed on four Nvidia V100S GPUs with a batch size of 64.

Full-wave electromagnetic simulations

The simulated fields shown in Fig. 5a, b were obtained using the finite element method, (COMSOL with the Electromagnetic Waves, Frequency Domain interface) in a 2D geometry. The length of the simulated devices was scaled down to reduce the computational time and consisted of a scattering region 15 µm in length followed by a 10 µm long output port. Perfect conductors were substituted for the photonic crystal reflectors on the top and bottom sides of the scattering region. The scattering region contained a randomly generated set of 6180 holes with radius = 50 nm. Perfectly matched layer (PML) boundary conditions were used on the boundaries of the observation region. Ports were used to excite the input waveguides (width 0.45 um) which had a pitch of 3.45 um with 16 total inputs in all giving a total device width of 55.2 µm (similar to fabricated structure). The pre-scattering-region length corresponding to the multimode waveguide region between the single mode waveguide inputs and the scattering region was varied between 50 nm and 32 µm and had a 4 µm air buffer on the side edges before absorbing PML boundaries. The effective material index of the silicon domains that we used for the 2D simulations was determined from 3D simulations and it was found to be n_e = 2.83. The effective material index of the air domains was n_e = 1. For the simulation, a triangular mesh was used with 1,256,090 elements with an average element quality of 0.92. A boundary mode analysis step was performed for each input waveguide port and the model was simulated at λ = 1550 nm. For analyzing the electromagnetic fields in the output region, the fields were exported on a regular grid with 100 nm steps in X and Y.

Sample fabrication

The silicon-photonics image encoder was fabricated using commercially available silicon-on-insulator (SOI) wafers. The wafers consisted of 250 nm silicon on top of a 3 µm buried oxide. The encoder was fabricated using a positive tone ZEP resist followed by electron beam lithography and inductively-coupled plasma reactive ion etching. The encoder consisted of N = 16 input single-mode waveguides, and the width of each waveguide was 450 nm. At the output, a ridge was fabricated to scatter the light out-of-plane, which was then measured to determine the transmission matrix of the encoder. The input waveguides were separated by 3.45 µm to minimize cross-coupling. The input waveguides were adiabatically tapered out towards the edge of the chip to increase the spacing between them to 10 µm, which was ~10X the size of the focused laser spot used for coupling the input light. This ensured that only one input waveguide was excited at a time during the measurement of the transmission matrix.

Experimental measurement

To measure the transmission matrix of our encoder (Fig. 5), we used an aspheric lens to couple continuous wave (CW) laser light at λ = 1550 nm to the chip. The lens was used to focus the laser beam to a spot of diameter ∼1.5 µm at the edge of the waveguides. To measure the transmission matrix, the focused laser spot was scanned to couple to each input waveguide while the transmitted speckle pattern was recorded from above using a long-working distance objective (50x, NA = 0.7) and an InGaAs camera (Xenics, Cheetah). All the images acquired during the measurement were then processed to determine the transmission matrix which we used for encoding the images. To quantify the experimental noise, the measurements were repeated, and the difference in magnitude of the elements of the transmission matrix was used as the experimental noise present in encoding the images.

Data availability

The data that support the findings of this study are available from the corresponding authors upon request.

Code availability

The codes and datasets in this study have been deposited in the Zenodo database under Creative Commons Attribution 4.0 International Public License at https://doi.org/10.5281/zenodo.10819458.

References

Yan, X. et al. “Compressive sampling for array cameras.” SIAM J. Imaging Sci. 14, 156–177 (2021).
Article MathSciNet Google Scholar
Nichols, J. M. et al. “Range performance of the DARPA AWARE wide field-of-view visible imager.” Appl. Opt. 55, 4478–4484 (2016).
Article ADS CAS PubMed Google Scholar
Brady, D. J. et al. “Parallel cameras.” Optica 5, 127–137 (2018).
Article ADS Google Scholar
Wang, T. et al. “Image sensing with multilayer nonlinear optical neural networks.” Nat. Photonics 17, 408–415 (2023).
Article ADS CAS Google Scholar
Brady, D. J. et al. “Multiscale gigapixel photography.” Nature 486, 386–389 (2012).
Article ADS CAS PubMed Google Scholar
Pang, W. & Brady, D. J. “Galilean monocentric multiscale optical systems.” Opt. Express 25, 20332–20339 (2017).
Article ADS PubMed Google Scholar
Chen, Y. et al. “Photonic unsupervised learning variational autoencoder for high-throughput and low- latency image transmission.” Sci. Adv. 9, eadf8437437 (2023).
Google Scholar
Li, J. et al. “Spectrally encoded single-pixel machine vision using diffractive networks.” Sci. Adv. 7, eabd7690 (2021).
Article ADS PubMed PubMed Central Google Scholar
Ashtiani, F., Geers, A. J. & Aflatouni, F. “An on-chip photonic deep neural network for image classification.” Nature 606, 501–506 (2022).
Article ADS CAS PubMed Google Scholar
Baek, S. H. et. al., “Single-shot hyperspectral-depth imaging with learned diffractive optics,” 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2631–2640 https://doi.org/10.1109/ICCV48922.2021.00265 (2021).
Yuan, X. & Haimi-Cohen, R. “Image compression based on compressive sensing: end-to-end comparison with JPEG.” IEEE Trans. Multimed. 22, 2889–2904 (2020).
Article Google Scholar
Wetzstein, G. et al. “Inference in artificial intelligence with deep optics and photonics.” Nature 588, 39–47 (2020).
Article ADS CAS PubMed Google Scholar
Solli, D. R. & Jalali, B. “Analog optical computing.” Nat. Photonics 9, 704–706 (2015).
Article ADS CAS Google Scholar
Wu, J. et al. “Analog optical computing for artificial intelligence.” Engineering 10, 133–145 (2022).
Article Google Scholar
Chen, Y. et al. “All-analog photoelectronic chip for high-speed vision tasks.” Nature 623, 48–57 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, X., Weigel, P. O., Zhao, J., Ruesing, M. & Mookherjea, S. “Achieving beyond-100-GHz large-signal modulation bandwidth in hybrid silicon photonics Mach Zehnder modulators using thin film lithium niobate.” APL Photonics 4, 096101 (2019).
Article ADS Google Scholar
Siew, S. Y. et al. “Review of silicon photonics technology and platform development.” J. Lightwave Tech. 39, 4374–4389 (2021).
Article ADS CAS Google Scholar
Vivien, L. et al. “Zero-bias 40Gbit/s germanium waveguide photodetector on silicon,”Opt. Express 20, 1096–1101 (2012).
Article ADS CAS PubMed Google Scholar
Wallace, G. K. “The JPEG still picture compression standard.” Commun. ACM 34, 30–44 (1991).
Article Google Scholar
Bank, D., Koenigstein, N. and Giryes, R. “Autoencoders.” Preprint at https://doi.org/10.48550/arXiv.2003.05991 (2020).
Bajaj, K., Singh, D. K. & Ansari, M. A. “Autoencoders based deep learner for image denoising.” Procedia Computer Sci. 171, 1535–1541 (2020).
Article Google Scholar
Theis, L., Shi, W., Cunningham, A. & Huszár, F. “Lossy image compression with compressive autoencoders.” Preprint at https://doi.org/10.48550/arXiv.1703.00395 (2017).
Havasi, M., Peharz, R., and Hernández-Lobato, J. M. “Minimal random code learning: Getting bits back from compressed model parameters.” Preprint at https://doi.org/10.48550/arXiv.1810.00440 (2018).
Johnson, W. B. “Extensions of Lipschitz mappings into a Hilbert space,” Contemp. Math. 26, 189–206 (1984).
Article MathSciNet Google Scholar
Candes, E. J. & Tao, T. “Near-optimal signal recovery from random projections: Universal encoding strategies?” IEEE Trans. Inf. Theory 52, 5406–5425 (2006).
Article MathSciNet Google Scholar
Donoho, D. L. “Compressed sensing.” IEEE Trans. Inf. Theory 52, 1289–1306 (2006).
Article MathSciNet Google Scholar
Liutkus, A. et al. “Imaging with nature: Compressive imaging using a multiply scattering medium.” Sci. Rep. 4, 1–7 (2014).
Article Google Scholar
Wendland, D. et al. “Coherent dimension reduction with integrated photonic circuits exploiting tailored disorder.” JOSA B 40, B35 (2023).
Article CAS Google Scholar
ONSEMI NOII4SM6600A: https://www.onsemi.com/pdf/datasheet/noii4sm6600a-d.pdf (2024).
Redding, B., Popoff, S. M., Bromberg, Y., Choma, M. A. & Cao, H. “Noise analysis of spectrometers based on speckle pattern reconstruction.” Appl. Opt. 53, 410–417 (2014).
Article ADS PubMed Google Scholar
Agustsson, E. and Timofte, R. “Ntire 2017 challenge on single image super-resolution: Dataset and study.” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, Honolulu, HI, USA. https://doi.org/10.1109/CVPRW.2017.151 (2017).
Lim, B., Son, S., Kim, H., Nah, S., Mu Lee, K., “Enhanced deep residual networks for single image super-resolution.” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, Honolulu, HI, USA. https://doi.org/10.48550/arXiv.1707.02921 (2017).
Spencer, D. T. et al. “An optical-frequency synthesizer using integrated photonics.” Nature 557, 81–85 (2018).
Article ADS CAS PubMed Google Scholar
Yamilov, A. et al. “Position-dependent diffusion of light in disordered waveguides.” Phys. Rev. Lett. 112, 023904 (2014).
Article ADS PubMed Google Scholar
Sarma, R., Yamilov, A. G., Petrenko, S., Bromberg, Y. & Cao, H. “Control of energy density inside a disordered medium by coupling to open or closed channels.” Phys. Rev. Lett. 117, 086803 (2016).
Article ADS PubMed Google Scholar
Redding, B., Liew, S. F., Sarma, R. & Cao, H. “Compact spectrometer based on a disordered photonic Chip.” Nat. Photonics 7, 746 (2013).
Article ADS CAS Google Scholar
Komma, J., Schwarz, C., Hofmann, G., Heinert, D. & Nawrodt, R. “Thermo-optic coefficient of silicon at 1550 nm and cryogenic temperatures.” Appl. Phys. Lett. 101, 041905 (2012).
Article ADS Google Scholar
Valley, G. C. “Photonic analog-to-digital converters.” Opt. Express 15, 1955 (2007).
Article ADS PubMed Google Scholar
Miller, D. A. “Energy consumption in optical modulators for interconnects.” Opt. express 20, A293–A308 (2012).
Article ADS PubMed Google Scholar
Nozaki, K. et al. “Femtofarad optoelectronic integration demonstrating energy-saving signal conversion and nonlinear functions.” Nat. Photonics 13, 454–459 (2019).
Article ADS CAS Google Scholar
Li, G. et al. “25 Gb/s 1V-driving CMOS ring modulator with integrated thermal tuning.” Opt. Express 19, 20435 (2011).
Article ADS PubMed Google Scholar
Mulcahy, J., Peters, F. H. & Dai, X. “Modulators in silicon photonics - Heterogenous integration & and beyond.” Photonics 9, 40 (2022).
Article CAS Google Scholar
Vaskasi, J. R. et al. “High wall-plug efficiency and narrow linewidth III-V-on-silicon C-band DFB laser diodes.” Opt. Express 30, 27983–27992 (2022).
Article ADS CAS PubMed Google Scholar
Usevitch, B. E. “A tutorial on modern lossy wavelet image compression: foundations of JPEG 2000.” IEEE Signal Process. Mag. 18, 22–35 (2001).
Article ADS Google Scholar
Turcza, P. “Entropy encoder for low-power low-resources high-quality CFA image compression.” Signal Process.: Image Commun. 106, 116716 (2022).
Google Scholar
Vandoorne, K. et al. “Experimental demonstration of reservoir computing on a silicon photonics chip.” Nat. Commun. 5, 3541 (2014).
Article ADS PubMed Google Scholar
Redding, B. et al. “Evanescently coupled multimode spiral spectrometer.” Optica 3, 956–962 (2016).
Article ADS CAS Google Scholar
Borlaug, D. B. et al. “Photonic integrated circuit based compressive sensing radio frequency receiver using waveguide speckle.” Opt. Express 29, 19222–19239 (2021).
Article ADS CAS PubMed Google Scholar
Grubel, B. C. et al. “Silicon photonic physical unclonable function.” Opt. Express 25, 12710–12721 (2017).
Article ADS CAS PubMed Google Scholar
Zhang, Z., Liu, Q. & Wang, Y. “Road extraction by deep residual u-net.” IEEE Geosci. Remote Sens. Lett. 15, 749–753 (2018).
Article ADS CAS Google Scholar
Jha D. et. al. “Resunet ++: An advanced architecture for medical image segmentation.” in 2019 IEEE International Symposium on Multimedia (ISM), 225–2255. https://doi.org/10.1109/ISM46123.2019.00049 (2019).
He, K., Zhang, X., Ren, S., and Sun, J. “Deep residual learning for image recognition.” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778 https://doi.org/10.1109/CVPR.2016.90 (2016).
Xavier, G. and Bengio, Y. “Understanding the difficulty of training deep feedforward neural networks.” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 9, 249–256 (2010).
Kingma, D. P. and Ba, J. “Adam: a method for stochastic optimization.” Preprint at https://doi.org/10.48550/arXiv.1412.6980 (2014).

Download references

Acknowledgements

This work was performed in part at the Center for Integrated Nanotechnologies, an Office of Science User Facility operated by the U.S. Department of Energy (DOE) Office of Science. This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories, a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Author information

These authors contributed equally: Xiao Wang, Brandon Redding.

Authors and Affiliations

Wyant College of Optical Sciences, University of Arizona, Tucson, Arizona, USA
Xiao Wang, James Skowronek & David Brady
U.S. Naval Research Laboratory, Washington, DC, USA
Brandon Redding
Sandia National Laboratories, Albuquerque, New Mexico, USA
Nicholas Karl, Christopher Long & Raktim Sarma
CREOL, The College of Optics and Photonics, University of Central Floria, Orlando, Florida, USA
Zheyuan Zhu & Shuo Pang
Center for Integrated Nanotechnologies, Sandia National Laboratories, Albuquerque, New Mexico, USA
Raktim Sarma

Authors

Xiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Brandon Redding
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Karl
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Long
View author publications
You can also search for this author in PubMed Google Scholar
Zheyuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
James Skowronek
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Pang
View author publications
You can also search for this author in PubMed Google Scholar
David Brady
View author publications
You can also search for this author in PubMed Google Scholar
Raktim Sarma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.S., D.B., and B.R. conceived the idea. R.S. supervised the project. X.W., J.S., and D.B. developed the image processing and reconstruction algorithms. B.R. performed the optical characterization. N.K. and R.S. performed the electromagnetic simulations. R.S. and C.L. fabricated the samples. Z.Z. and S.P. performed the energy analysis. All the authors discussed the results and contributed to the writing and editing of the manuscript.

Corresponding authors

Correspondence to David Brady or Raktim Sarma.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Qionghai Dai, Peter McMahon and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, X., Redding, B., Karl, N. et al. Integrated photonic encoder for low power and high-speed image processing. Nat Commun 15, 4510 (2024). https://doi.org/10.1038/s41467-024-48099-2

Download citation

Received: 09 June 2023
Accepted: 17 April 2024
Published: 27 May 2024
DOI: https://doi.org/10.1038/s41467-024-48099-2

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.