RunningSphinxTrain |
UserPreferences |
Contained here is a recipe for improving the standard Sphinx2 distribution accuracy by combining your speaker-dependent training with some speaker-independent data (via hub4).
Do this:
export SPHINXTRAINDIR=/usr/local/src/SphinxTrain
mkdir time cd time
$SPHINXTRAINDIR/scripts_pl/setup_SphinxTrain time
SIL THIS IS A TEST OF THE EMERGENCY BROADCAST SYSTEM PERIOD SIL EMERGENCY BROADCASTS ARE ONLY TEMPORARY PERIOD SIL (time0001)
time0001
(be careful not to put any extra spaces or blank lines).
<s> SIL
</s> SIL
<sil> SIL
++BREATH++ +BREATH+
++COUGH++ +COUGH+
++SMACK++ +SMACK+
++UH++ +UH+
++UM++ +UM+
You may need to delete the "++NOISE++" line to get rid of warnings.
% ecasound -i:/dev/dsp -o wav/time0001.wav
When recording this file, say the following:
"This is a test of the emergency broadcast system period" pause "Emergency broadcasts are only temporary period"
variances -> /usr/local/src/hub4-2000-11-17-1/sphinx_3_format/variances transition_matrices -> /usr/local/src/hub4-2000-11-17-1/sphinx_3_format/transition_matrices newfe.6000.mdef -> /usr/local/src/hub4-2000-11-17-1/sphinx_3_format/newfe.6000.mdef mixture_weights -> /usr/local/src/hub4-2000-11-17-1/sphinx_3_format/mixture_weights means -> /usr/local/src/hub4-2000-11-17-1/sphinx_3_format/means
note: jphekman went through this process without the newfe.6000.mdef link, and things worked. Is it actually used?
$s3mixw = "$s3hmmdir/mixture_weights";
with:
$s3mixw = "$CFG_BASE_DIR/model_parameters/$CFG_EXPTNAME.cd_semi_$CFG_N_TIED_STATES"."_interp/mixture_weights";
Also, replace
$s3hmmdir="$CFG_BASE_DIR/model_parameters/$CFG_EXPTNAME.cd_semi_$CFG_N_TIED_STATES"."_delinterp";
with:
$s3hmmdir="model_parameters/hub4";
bin/wave2feat -verbose -c $1 -nist -di wav -ei wav -do feat -eo feat
to:
bin/wave2feat -verbose -c $1 -raw -di wav -ei wav -do feat -eo feat
Then, run this script (builds2model) in the time directory:
#!/bin/sh -xv rm time.html rm model_architecture/time.[a-z]* bin/make_dict etc/time.transcription # #WIll create etc/word.known etc/word.unknown files,. check them once you are happy, # mv etc/word.known etc/time.dic # #Make the melcep feature files # bin/make_feats etc/time.fileids # #Now we can start on the basic perl scripts, ther #results will be put in perl_time.html, which you can view as things progress. Note the scripts aren't guaranteed, and problems do occur, \ though often the error message is actually indicative of the error. # #There a number of larger choices in building models, one is to build continous models or semi-continuous. Only semi-continous #can be used by Sphinx2. # #Doesn't do enough checking, but may help export SPHINXTRAINDIR=. ./scripts_pl/00.verify/verify_all.pl #Can take several minutes rm -r model_parameters/time.*[a-z] rm -r logdir/0*[a-z] PATH=/bin:$PATH ./scripts_pl/01.vector_quantize/slave.VQ.pl #Initial BW training PATH=/bin:$PATH ./scripts_pl/02.ci_schmm/slave_convg.pl ./scripts_pl/03.makeuntiedmdef/make_untied_mdef.pl PATH=/bin:$PATH ./scripts_pl/04.cd_schmm_untied/slave_convg.pl ./scripts_pl/05.buildtrees/make_questions.pl ./scripts_pl/06.prunetree/slave.state-tie-er.pl PATH=/bin:$PATH ./scripts_pl/07.cd-schmm/slave_convg.pl PATH=/bin:$PATH ./scripts_pl/08.deleted-interpolation/deleted_interpolation.pl mkdir model_parameters/time.cd_semi_6000_interp ./bin/mixw_interp -SImixwfn model_parameters/time.cd_semi_6000_delinterp/mixture_weights -SDmixwfn model_parameters/hub4/mixture_weights -SIlambda 0.1 -outmixwfn model_parameters/00.cd_semi_6000_interp/mixture_weights ./scripts_pl/09.make_s2_models/make_s2_models.pl
Errors will be put in 'train.log'.