Introduction
Even though the disabled are being more respected nowadays due to the Disability Rights Movement [1], many are still being taken advantage of on a daily basis. Lacking vision, the blind has the most difficulties in interacting with those who have ill intentions. They are often deceived by one’s tone and voice since they are unable to recognize the person’s real emotions through vision. Our proposed technology, ISight, aims to create a pair of glasses that is able to convert the target’s feelings into related music to inform the user of his or her emotions. This report will analyze the background technology that will be used in the development, as well as the technical details and feasibility of the product.
Background / Literature Review
Currently, emotions can be detected by using artificial intelligence and machine learning through voice, text, and facial expressions:
Voice and Speech Pattern Processing
By analyzing the primary characteristics of one’s speech, emotions can be extrapolated from existing data. The main attributes that are taken into account in the evaluation are pitch, loudness, timbre, and tone [2]. Through an algorithmic approach, these features are used as parameters and the emotion is computed by the closest match of large samples collected from data-mining. Since humans often express emotions by using different vocal techniques, the results can be very accurate with a large enough sample space. Nevertheless, background noises may cause significant, unintended effects on the person’s speech, leading to misinterpretations of the corresponding emotions. As a result, this method is not suitable to be used in public areas that are prone to noises.
Sentiment Analysis
On the contrary, sentiment analysis will not be affected by background sounds because the content of a dialogue can be extracted perfectly after noise reduction [3]. This model determines the emotion of sentences and paragraphs by assessing the specific words and phrases used [4]. Based on a dictionary with predetermined polarity and strength of each word, the algorithm can be used to capture the overall feelings of the combined text. Even though this method can achieve a very high accuracy with increased contexts, it does not have a better judgment than humans in sentiment evaluation as shown by a contest held by Visible [5]. Therefore, this method does not provide any significant improvement to the conversations of the visually impaired.
Facial Expressions Recognition
Facial expressions are distinguished by first identifying a human face and then classifying emotions with the corresponding information. A face tracking algorithm is used to read video data and output a range of motion features at specific locations of the face [6]. These outputs provide data as inputs for a Bayesian network classifier. According to the movements of unique features of the face, such as the horizontal and vertical movement of the lip and brows, this classifier is able to correctly determine the person’s emotion with an accuracy up to 90%. The Bayesian network classifier provides the best performance currently as it is a framework that integrates multiple different models in a coherent manner. In addition, this type of recognition can easily collect the video data it requires during conversations, thus there will not be any data corruption issues like speech processing does.
Since ISight is intended to help the visually impaired, facial expressions recognition helps equip them with a completely new ability compared to the other two methods. Unlike sound data, facial expressions capturing will not be affected by environmental noises and can be conveniently obtained through face-to-face interactions. Therefore, ISight will make use of facial expressions recognition to help the blind identify target’s emotions during conversations.
Technical Description
![](https://static.wixstatic.com/media/24faeb_d3add47fe7ec447da73f3d374af81e60~mv2.jpg/v1/fill/w_624,h_314,al_c,q_80,enc_auto/24faeb_d3add47fe7ec447da73f3d374af81e60~mv2.jpg)
Fig. 1. Outlook and major components of ISight.
ISight is a device that enables the visually impaired to recognize others’ emotions through facial expressions. As shown from [7, Fig. 1], it is shaped in the same way as a normal pair of glasses with dimensions of 5 inches wide and 8 inches long. The lenses are detachable and are locked tight with buttons at the bridge that connects the two eye wires. The battery used to power the device is a thin rectangular piece that fits within the left temple of the glasses, allowing each temple to only be at 0.5 centimeters at its widest point. The battery can be charged through a USB port located at the left rim of the lenses. ISight contains two major components: a micro camera and bone-conduction earphones.
A micro camera with high definition and 1080p resolution is attached to the right rim of the lenses. The small size of the camera allows the product to weigh only 30 grams and makes itself not easily noticeable by others. The captured recording will undergo the processing of the facial expressions recognition software and be converted into songs that convey similar feelings generated by machine intelligence [8].
Attached to the inner surface near the rear ends of the temples are bone-conduction earphones that transport the sounds directly to the user’s cochlea through direct vibrations as shown by [9, Fig. 2]. This way, the sound waves do not go through the outer and middle ear, allowing the eardrums to receive other sounds freely. As a result, the music will not interfere with the user’s reception of the actual content of the conversation.
![](https://static.wixstatic.com/media/24faeb_709f0e5086084ff399b4257df961d042~mv2.png/v1/fill/w_934,h_644,al_c,q_90,enc_auto/24faeb_709f0e5086084ff399b4257df961d042~mv2.png)
Fig. 2. Illustration of how bone conduction works.
Feasibility / Benefits
ISight provides the blind with the ability to distinguish others’ facial expressions that has not been achieved in the past. This can significantly improve the social and economic lives of the disabled citizens as they are able to communicate more confidently with the additional information. In addition, they may avoid scams and deceivers by recognizing their intentions from their facial gestures. With these benefits, the unsighted will be able to interact with others without being affected by their impairment.
One major concern for ISight is that users without music knowledge may not identify emotions accurately from the tunes. According to the research done by Dr. Patrik N. Juslin in the field of Psychology, there is almost no effect of musical training on one’s ability to recognize the emotion that the music is conveying [10]. Instead, it is related to one’s emotional intelligence to perceive emotions, which is the same ability applied to areas of facial and vocal expressions [11]. Thus, the use of music in ISight perfectly imitates the result equivalent to one’s observations of another’s face.
ISight is made of small components, making it light and comfortable to wear. Since the blind has to wear sunglasses to protect their eyes from sunlight and physical dangers [12], ISight is a convenient replacement as it provides additional functionalities without adding anything extra to what they have to wear. The use of bone-conduction earphones allows sound to be picked up by the user without any interference. Hence, users can hear everything around them and do not have any further safety concerns from using the product. Privacy is also not an issue because ISight’s program is an offline software and does not provide recording functionalities. All of the data collected is processed in real-time and will not be saved.
The cost of developing ISight is estimated at around five hundred U.S. dollars. This includes the costs of the micro-camera, bone-conduction earphones, detachable lens, and the main software. Most designer sunglasses are currently sold for about three hundred U.S. dollars even though the production costs are a lot lower [13]. Therefore, ISight is reasonably priced with the advanced functionalities and components. It is expected to be a useful and successful product that will be welcomed by those with different types of vision impairment.
Because of the small battery size of ISight, the average continuous usage time is only approximately three hours. However, this limitation is not significant because the device is chargeable via a USB port. Except for charging by vehicles and computers, many places also offer USB charging areas for public uses. Moreover, portable power banks can also fully charge ISight multiple times throughout the day. Despite the short battery life of ISight, the product can function the majority of the day because of its universal recharging method.
Conclusion
Mistreating the disabled has not improved even with more awareness of the issue. They are still being taken advantage of all the time. The unsighted has to guess others’ intentions based solely on what they say and is often deceived as a result. In order to improve the lives of those who are visually impaired, ISight provides them with the ability to determine others’ emotions by providing them with information obtained from facial expressions. These facial gestures are converted into corresponding moods of music, allowing the blind to identify the true feelings of the target. This allows them to have more confidence in conversing with others since they will have the same understanding of the target as those who can see. (1474 words)
References
[1] "A Brief History of the Disability Rights Movement", Anti-Defamation League. [Online]. Available: https://www.adl.org/education/resources/backgrounders/disability-rights-movement.
[2] P. Dasgupta, "Detection and Analysis of Human Emotions through Voice and Speech Pattern Processing", Arxiv.org, 2017. [Online]. Available: https://arxiv.org/ftp/arxiv/papers/1710/1710.10198.pdf.
[3] "Noise Reduction - Audacity Development Manual", Manual.audacityteam.org. [Online]. Available: https://manual.audacityteam.org/man/noise_reduction.html.
[4] M. Taboada, J. Brooke, M. Tofiloski, K. Voll and M. Stede, "Lexicon-Based Methods for Sentiment Analysis", Mitpressjournals.org, 2011. [Online]. Available: https://www.mitpressjournals.org/doi/pdfplus/10.1162/COLI_a_00049.
[5] S. Rutledge, "Humans vs. AI in a Sentiment Bout. And the Winner is…", Cision, 2011. [Online]. Available: https://www.cision.com/us/2011/04/humans-vs-ai-in-a-sentiment-bout-and-the-winner-is/.
[6] Machine Learning in Computer Vision. Dordrecht: Springer, 2005, pp. 187-209.
[7] Aviator, HD 720P AVIATOR SUNGLASSES SPY CAM. Available:http://www.spyemporium.com/body-worn-cameras/hd-720p-aviator-sunglasses-spy-cam/.
[8] K. McDonald, "Neural Nets for Generating Music – Artists and Machine Intelligence – Medium", Medium, 2017. [Online]. Available: https://medium.com/artists-and-machine-intelligence/neural-nets-for-generating-music-f46dffac21c0.
[9] "What is bone conduction? - Bone conduction", Bone conduction, 2017. [Online]. Available: http://www.bone-conduction.com/en/what-is-bone-conduction/.
[10] P. Juslin, Emotional communication in music performance. 1997, pp. 383–418.
[11] J. Resnicow, P. Salovey and B. Repp, "Music Perception: An Interdisciplinary Journal", Recognizing Emotion in Music Performance Is an Aspect of Emotional Intelligence, 2004. [Online]. Available: http://ei.yale.edu/wp-content/uploads/2014/01/pub60_ResnicowSalovey2004_MusicandEI.pdf.
[12] "Why Do Blind People Wear Dark Glasses?", Scienceabc.com, 2018. [Online]. Available: https://www.scienceabc.com/eyeopeners/why-do-blind-people-wear-dark-glasses.html.
[13]D. Myrland and H. Crook, "Why Do We Pay Hundreds for Shades that Cost $3 to Make?", KPBS Public Media, 2009. [Online]. Available: http://www.kpbs.org/news/2009/jun/22/why-do-we-pay-hundreds-shades-cost-3-make/.
コメント