In computational imaging, researchers make images using single-pixel detectors instead of multi-pixel image sensors found in conventional digital cameras. Single-pixel systems are cheaper and can build images at wavelengths where conventional cameras are expensive or simply don’t exist, such as at infrared or terahertz frequencies. However, they do have certain limitations (see Box). Now, a University of Glasgow team has taken a new approach to creating video using single-pixel cameras, which they say not only surmounts these limitations, but has the potential to change the way many imaging systems work in future.
‘Initially, the problem I was trying to solve was how to maximise the frame rate of the single-pixel system to make the video output as smooth as possible,’ says David Phillips, team leader of the University of Glasgow’s School of Physics and Astronomy: ‘However, I started to think a bit about how vision works in living things and I realised that building a programme which could interpret the data from our single-pixel sensor along similar lines could solve the problem.’
Many animal species see clearly in the middle of their visual field – an area that corresponds to the area in the centre of the retina called the fovea. This means that the subject that they’re watching is in very sharp focus while things in the periphery of their vision are of average clarity. This is called foveated vision.
The Glasgow team’s approach set out to mimic this approach, varying the resolution throughout an image. They placed an array of tiny mirrors, called a digital micromirror device (DMD), at the image plane of a camera lens; that is, at the same plane where a multi-pixel camera sensor would be placed in a conventional camera (Science Advances, doi: 10.1126/sciadv.1601782). The DMD is used to rapidly mask the image of the scene with a set of binary patterns. The single-pixel light detector - a photodiode - records the total amount of light transmitted by each mask, representing a measurement of the level of correlation of each mask with the scene.
Knowledge of the transmitted intensities and the corresponding masks enables reconstruction of the image. The team demonstrated its system at visible wavelengths, but the technique is not limited to this range.
In their images made using a single pixel, the team recorded the central area in high resolution while the other parts of the scene are sampled at lower resolution. They determined the position of the area of high resolution – the ‘fovea’ - in each subframe by displaying a particular subset of patterns preloaded on the DMD. The system allocates its ‘pixel budget’ to prioritise the most important areas within the frame, placing a great number of higher resolution pixels in these locations and so sharpening the detail of some sections while sacrificing detail in others. In conventional cameras, these pixels would be spread evenly in a grid across the image. The result is that the ‘foveated’ image has smaller, more densely packed pixels in the centre and larger pixels in the peripheries.
In this new approach, the team produced square images, with an overall resolution of 1000 pixels. This pixel distribution can be changed from one frame to the next, similar to the way biological vision systems work, for example, when a person’s gaze is redirected from one person to another. Indeed, the team has shown, using real-time feedback control, they can rapidly reposition the fovea to follow objects within the field of view and track motion within the scene. Yet, unlike a simple zoom, they point out that every frame delivers new spatial information from across the entire field of view. This strategy rapidly records the detail of quickly changing features in the scene while simultaneously accumulating detail of more slowly evolving regions over several consecutive frames.
‘By channelling our pixel budget into areas where high resolutions were beneficial, such as where an object is moving, we could instruct the system to pay less attention to the other areas of the frame,’ Phillips explains. ‘By prioritising the information from the sensor in this way, we’ve managed to produce images at an improved frame rate but we’ve also taught the system a valuable new skill.’ While the degree of local frame rate enhancement depends on the scene, the team has demonstrated improvements up to four times better, so helping to mitigate one of the main drawbacks of single-pixel imaging techniques.
Complementing these automated ‘fovea guidance’ techniques, the team has also designed a manual fovea control system, where a user can click on the part of the scene they wish to view in high resolution. They also envisage other forms of manual control, for example, a single operator could control the fovea by using a gaze tracker to measure eye movements and by placing the high-resolution fovea wherever the operator looked.
Phillips says this method should complement existing compressive sensing approaches but has other important potential applications. It should enable imaging in a variety of situations that are challenging or impossible with multipixel image sensors. Examples include imaging at wavelengths where multipixel image sensors are unavailable, such as in the terahertz band, or in the presence of scattering media.
‘We’re keen to continue improving the system and explore the opportunities for industrial and commercial use, for example in medical imaging,’ he says. Terahertz medical devices are being investigated for surface imaging of things like skin cancer and tooth decay and laboratory tests on thin tissue samples. But longer term, they might have a role in brain imaging, full-body scanning or tumour imaging.
Jesus Lancis Saez, an optics specialist with an interest in single-pixel systems at the Universitat Jaume I in Castelló de la Plana, Spain, says this is a novel approach to the real-time single-pixel camera. ‘Even though this foveated approach has been used in the imaging community, as far as I know, this is the first time that has been applied to a single-pixel camera,’ he comments. ‘By doing this, they increase the frame rate without sacrificing resolution in some regions of the scene, while others are sampled fast, but with lower detail. They choose where to put the region with high detail (the fovea) by mimicking human vision. By using movement detection techniques, they locate regions where the scene is rapidly changing, and so they direct the fovea to that region.’
The Glasgow’s team approach for spatially varying resolution with a single-pixel camera is definitely novel, agrees Guy Satat, a research assistant in the Camera Culture group at MIT Media Lab, whose work focuses on new methods for medical imaging. ‘The single-pixel camera framework is important for imaging in parts of the spectrum where pixel arrays are not available, or when using a lens is challenging. The paper tackles the problem of long acquisition time that is one of the main limitations of a single pixel camera. The authors suggests a clever approach to trade-off between resolution and acquisition time by recovering images with spatially variant resolution. This technique enables high frame rate recovery of high resolution images at the area of interest while maintaining low resolution in other parts of the scene.’
Saez agrees that there are many situations that could be studied by using the foveated image approach, such as cellular division or movement, which could be easily monitored with this system, if built into a microscopy setup. Such systems could benefit from single-pixel strengths, he adds. However, he is unsure if the Glasgow method will become established as the norm in the single-pixel community. ‘Of course, in applications where this movement detection approach cannot be used to fix the foveated region, other approaches should be explored such as CS. Also, if high spatial resolution is needed in all the regions of the scene, the method also presents some difficulties.’
In contrast to conventional multi-pixel cameras, single-pixel cameras capture images using a single detector that measures the correlations between the scene and a set of patterns. The patterns can either be projected onto the scene - structured illumination - or used to mask passively an image of the scene - structured detection. Researchers can then choose different reconstruction data-mining algorithms to fuse the information from multiple subframes to recover an improved estimate of the original scene.
However, single-pixel systems typically display low frame rates of images/second, because to sample fully a scene in this way requires at least the same number of correlation measurements as the number of pixels in the reconstructed image. This is a limitation for films and videos as higher frame rates produce smoother fast-action scenes.
‘In order to obtain an NxN [where N is the number of pixels] image, NxN patterns need to be generated, which takes time,’ explains Lancis Saez. To get around this problem, researchers have developed several techniques, such as Compressive Sensing (CS). However, he says, CS requires the use of ‘demanding computational algorithms’, which make it difficult to implement in real-time applications.