Human beings are capable of imagining a person’s voice according to his or her appearance because different people have different voice characteristics. Although researchers have made great progress in single-view speech synthesis, there are few studies on multi-view speech synthesis, especially the speech synthesis using face images. On the basis of implicit relationship between the speaker’s face image and his or her voice, we propose a multi-view speech synthesis method called SSFE (Speech Synthesis with Face Embeddings). The proposed SSFE consists of three parts: a voice encoder, a face encoder and an improved multi-speaker text-to-speech (TTS) engine. On the one hand, the proposed voice encoder generates the voice embeddings from the speaker’s speech and the proposed face encoder extracts the voice features from the speaker’s face as f-voice embeddings. On the other hand, the multi-speaker TTS engine would synthesize the speech with voice embeddings and f-voice embeddings. We have conducted extensive experiments to evaluate the proposed SSFE on the synthesized speech quality and face-voice matching degree, in which the Mean Opinion Score of the SSFE is more than 3.7 and the matching degree is about 1.7. The experimental results prove that the proposed SSFE method outperforms state-of-the-art methods on the synthesized speech in terms of speech quality and face-voice matching degree.
Applied Intelligence - Face aging is of great significance in cross-time identity verification problem. However, there is still a huge gap between the synthesized face image and the real face in... 相似文献
Flush toilets waste a significant amount of water every day due to the unavoidable adhesions between human waste and the toilet surfaces. Super-slippery surfaces can repel complex fluids and various viscoelastic solids, however, are easily broken by mechanical abrasions. Herein, the fabrication of an abrasion-resistant super-slippery flush toilet (ARSFT) is reported using a selective laser sintering 3D printing technology. Unlike traditional super-slippery surfaces with limited thicknesses which can be easily worn away, the powder-sintered strategy endows the ARSFT not only with a self-supporting 3D complex shape but also with a porous structure that can accommodate considerable lubricants for an abrasion-resistant super-slippery property. As a result, the as-prepared ARSFT remains clean after contacting with various liquids such as milk, yogurt, highly sticky honey, and starch gel mixed congee, demonstrating excellent repellence to complex fluids. Besides liquids, the ARSFT exhibits a high resistance to sticky synthetic feces. Notably, even after being abraded to 1,000 cycles of abrasion using sandpaper, the ARSFT maintains its record-breaking super-slippery capability. The concept of the 3D-printed object with a superior abrasion-resistant slippery ability will improve the development of super-slippery materials and further save water consumption in the human society. 相似文献
Abstract— In this paper, several methods to characterize motion blur on liquid‐crystal displays are reviewed. Based on the assumptions of smooth‐pursuit eye tracking and one‐frame temporal luminance integration, a simple algorithm has been proposed to calculate the normalized blurred edge width (N‐BEW) and motion‐picture response time (MPRT) with a one‐frame‐time moving‐window function to LC temporal step response curves. A custom measurement system with a fast‐eye‐sensitivity‐compensated photodiode has been developed to characterize motion blur based on LC response curves (LCRCs). MPRT values obtained by using the algorithm mentioned above and those from the smooth‐pursuit‐camera methods agree. Perception experiments were conducted to validate the correspondence between the simulated results and actual perceived images by the human eyes. In addition, the insufficiency of MPRT to evaluate motion blur on impulse‐type light‐generation LCDs, by analyzing the measurement results of a scanning backlight LCD, is discussed. 相似文献