- Speech
Recognition - Speech
Synthesis - Dialog
Systems
Speech Processing offers a practical and theoretical understanding of how human speech can be processed by computers. It covers speech recognition, speech synthesis and spoken dialog systems. The course involves practicals where the student will build working speech recognition systems, build their own synthetic voice and build a complete telephone spoken dialog system. This work will be based on existing toolkits. Details of algorithms, techniques and limitations of state of the art speech systems will also be presented. This course is designed for students wishing to understand how to process real data for real applications, applying statistical and machine learning techniques as well as working with limitations in the technology.
| Prerequisites | 15-211 for SCS undergraduates, exemption from this requirement requires the instructor's permission. |
| Availability | Open to juniors and seniors in the SCS undergraduate program and ECE Undergraduate program. Open to other students with the consent of an instructor. |
| Textbook | "Spoken Language Processing" by Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Prentice Hall (ISBN 0-13-22616-5) |
| Homework | Homework consists of two components: occasional Weekly brief reading assignments and four programming projects (Speech Recognition, Speech Synthesis, Spoken Dialog Systems, and one other). |
| Grading |
|
| Course policies |
|
| Research |
We are conducting research in this course, and you can participate in it! This course is about speech technologies. You will be learning about speech recognition, dialog systems, and speech synthesis (text-to-speech). One thing you will learn about in this course is the "understandabality of text-to-speech". In speech synthesis, we build models of human speech, such that given some text, we can make computers read it out for us. We have all used text-to-speech applications before: The voice-guided GPS navigation, telephone dialog systems, including the Let's Go system used by Pittsburgh Port Authority, are some examples. One of the goals of speech synthesis is to make voices more understandable. To understand what makes voices better, we are conducting research in this course, 15-492/18-492, this semester (Fall 2012). Some of the assignments of this class have been chosen for our research study. These will be available in two versions: text-based, as presented traditionally, and speech-synthesis-based. Students who choose to participate in the study will use the speech-based version of the assignment/reading and turn in their responses. Students who opt out will turn in their responses for the equivalent text-based version. Student participation is voluntary, on a per-assignment basis (so they can opt-in and opt-out at any time). Students will not receive financial benefit from participating in the study. They will get credit for their assignment responses if they are correct, but students who opt out will solve equivalent assignments. Thus there is no added bonus or loss from participating in the study. Only students of 15-492/18-492 are eligible to participate in this study. Participants are required to be in the range 18--35 yrs and must have normal hearing ability. |