Currently, our compound model "Dico v.0.9" provides a reasonably complete acoustic description of vowel production. It is based on the acoustics of the vocal tract and subglottal cavities. The acoustic Webster's resonators (based on area functions and Webster's horn model) are coupled to an incompressible and non-turbulent scalar flow model that excites a low-order, non-symmetric mass-spring model of the vocal folds.
The model consists of two parts: the glottal pulse generator and the 3D acoustic simulator.
Feedback couplings from both Webster's resonators to the vocal folds masses have been realised, and separate models for acoustic, viscous, and turbulent energy losses as well as losses to tissues have been incorporated. They appear as terms in the respective (partial) differential equations.
The model has been so designed that vocal tract geometries from MRI can be easily introduced for simulations of the acoustic velocity potential. The sound pressure and perturbation velocity profiles can be obtained from the potential.
Because cross-mode resonances of the vocal tract must be captured, the final simulated signal is produced by a separate 3D acoustic simulator. It is a 3D wave equation solver that is boundary controlled by the same glottal signals that feed the vocal tract Webster's resonator. The area function for Webster's resonator corresponds to the vocal tract geometry of the wave equation resonator during simulations. Furthermore, the mouth termination of Webster's model is tuned to coincide the longitudinal resonance structure of the wave equation resonator, equipped with an exterior space model. This is ongoing work based on measurement data, and it requires a lot of preparations with MRI image post-processing as well as signal processing of measured speech.
The glottal pulse generator is not intended to be a physically realistic description of the glottal flow in the sense that flow separation, jet formation, and realistic vocal folds geometries are not included. Instead, it is designed to serve as a validated, tunable source of glottis pulse which adapts to changing vocal tract geometries and specifications of the phonation type. Part of the model parameters can be chosen based on physical material parameters but there are some parameters (in particular, in the glottal flow submodel) that must be tuned to obtain the desired glottal pulse form.
FEM is used for the spatial discretisation of vocal tract geometries in all parts of the model.
High resolution modelling and model validation require large datasets from a large number of subjects in various phonetical configurations. To achieve high consistency of the data, speech samples and MRI must be collected simultaneously in the extremely difficult environment that the MRI room is. Special acoustic and electronic instrumentation has been developed in the project, aiming at the best possible final quality of speech signals as well as artefact-free MRI.
The experimental system is highly automated, and it is able to trigger the MRI machine (Siemens Avanto 1.5T) in sync with the rest of the experimental arrangements. It requires two persons to run the experiments: one setting the operating parameters of the MRI machine and the other running the experiment control system software.
Using this measurement system, it is possible to record about 100 combined MRI- and speech samples during a session of about 1h. Most of the samples are steady vowels (using 3D MRI) but also some phonetically interesting sentences are being imaged (using dynamic sagittal plane scans).
The MRI machine produces a stack of bitmaps stored in a DICOM file. The automatic extraction and artefact removal of the tissue-air interface from these images is a non-trivial task. Our custom software produces surface models that can be used for generating FEM meshes as well as vocal tract area functions. All this can be done with minimal manual effort, and thus large data sets can be processed.
The vocal tract surface extraction code produces surface models as STL files. Such files can be 3D printed at low cost by, e.g., Fused Deposition Modelling (FDM) printers such as the Ultimaker. We have experimented with Ultimaker hardware to produce objects from the vocal tract in all vowel configurations. Even though the anatomic geometry is quite challenging to print using FDM technology, the approach seems promising from both price and quality points of views.
3D printed vocal tract models are expected to be useful in planning oral and maxillofacial surgery just as dental casts have been used in orthodontics for decades. And, of course, they will produce an audible vowel sound if you excite them acoustically at the glottis position.
The main purpose of the speech data post-processing is to get rid of the acoustic noise, generated by the MRI machine, that contaminates the samples. A significant noise reduction is achieved by the dipole configuration of the sound collector but digital signal processing is needed as well. Our solution is based on 1) the compensation of measured frequency responses of the system, and 2) adaptive spectral filtering using the acoustic noise profiles produced by the MRI sequences.
We get almost clean steady vowel samples but the outcome of the dynamic speech samples is not quite as good.
After many rounds of pilot MRI experiments for finishing up the experimental system, the first data from patients were obtained in autumn 2012. These subjects are patients of oral and maxillofacial surgery in Turku University Hospital. Each patient will be imaged twice: pre- and post-operatively.
The patient data can only be used within the strong restrictions imposed by the Research Ethics Board. We are, however, able to provide some separate datasets for educational use and academic research here.
Teeth are not visible in MRI due to their low hydrogen content but they are an important acoustic element of the vocal tract. Hence, it is necessary to add separately imaged teeth geometry into the soft tissue geometry.
Optical scans of teeth or digitalised dental casts can be readily obtained from the patients but the alignment of the two geometries is a non-trivial problem. Markers containing vegetable oil, attached to the surface of the teeth, appear to be a practical approach that produces sufficient MRI visibility.