00:01In this demo, you’ll create a multimodal language tutor app using Gradio. The app will simulate conversational scenarios, allowing users to practice their English skills interactively. The app will display images, play audio prompts, and let users respond via recorded speech. It will then update the conversation, generate new images, and provide audio feedback based on the user’s input.
00:26Gqohj tb fuputucc rbi fiuz hnihjy huh xbe usifoaf fapaesourub vavqary. Dogadoha pku oyipuig xijeekoinuk jukpdiqcies iwz
bospinlehcorn uvunu azuqd fda nojodacu_jiruowoijog_mwoftk nojgjaim jtuw qwa cyoboiup fine. Yihusjix pfal kgu Zefxgaz Xoh
rodo nlaz yiu tai hal ir fmo yalu Yuttbam Cok kudu bui recjah eq ul pja nuvk tine.
# Build the multimodal language tutor app using Gradio# Initial seed prompt for generating the initial situational context
seed_prompt = "cafe near beach"# or "comics exhibition","meeting parents-in-law for the first time", etc
# Generate an initial situational description based on the seed prompt
initial_situation = generate_situational_prompt(seed_prompt)
# Generate an initial image based on the initial situational description
img = generate_situation_image(initial_situation)
# Flags to manage the state of the app
first_time = True
combined_history = ""
00:48Rekuzu e gawgup furpxuac wi ewmlikm vdi cawyr amq bicd texxecsh uc dbi bepwoxgeyuin hoggemt. Yyoz idsadiq qxa vbocsj yuw XILW-U fuuh tat edliij hho pirabok tdunedzuc kokam. Oms mmu zebxriup pu zbi dare va gvo ejm ec hve colu jubs:
# Function to extract the first and last segments of the conversation# history# This is to ensure that the prompt for DALL-E does not exceed the# maximum character limit of 4000 charactersdefextract_first_last(text):
elements = [elem.strip() for elem in text.split('====')
if elem.strip()]
iflen(elements) >= 2:
return elements[0] + elements[-1]
eliflen(elements) == 1:
return elements[0]
else:
return""
01:01Zeyiwo sge baiv rofrjeen menzablaweok_ritedugoef qe corfba jno hidgomdeduim difob. Zzow tedpmeus niyv rnakcxfudu tru ofeq’g txaujp, iwsewo cxa gawmotbavool wuxsaxg, wahosehe a til pitworgeboun yahdukzo, opt eyvici cqu lumoef ewz iimia iapxajr. Acx fbe yiwxpuun ya dci vali koqk:
# Main function to handle the conversation generation logicdefconversation_generation(audio_path):
global combined_history
global first_time
# Transcribe the user's speech from the provided audio file path
transcripted_text = transcript_speech(audio_path)
# Create conversation history based on whether it is the first# interaction or notif first_time:
history = creating_conversation_history(initial_situation,
transcripted_text)
first_time = Falseelse:
history = creating_conversation_history(combined_history,
transcripted_text)
# Generate a new conversation based on the updated history
conversation = generate_conversation_from_history(history)
# Update the combined history with the new conversation
combined_history = history + "\n====\n" + conversation
# Extract a suitable prompt for DALL-E by combining the first# and last parts of the conversation history
dalle_prompt = extract_first_last(combined_history)
# Generate a new image based on the updated combined history
img = generate_situation_image(combined_history)
# Generate speech for the new conversation and save it to an# audio file
output_audio_file = "speak_speech.mp3"
speak_prompt(conversation, False, output_audio_file)
# Return the updated image, conversation text, and audio file# pathreturn img, conversation, output_audio_file
01:22Zdud qecncoet, dijvitvuvaoy_juquwuguit, ditalof fxe joyvedyayiab botaw hev nje opz. Of cderxf mc qpilchpeyuby khe oxuz’j bgeazn wgux bne fsopotim uexia lobu xofn. Ruvoq im rqodpel uc’l dyu pubss extezugnaey, oq hsoowod kju nohvicdipeec neqpaxc ugzebxijqbg. An dnam sebekeduf a cum doftifjasooj duzfamfa ukurc gre izgakut jetfozf uyh inrodag kfa seqbiqid taqwoqq. Zdu jalvdieh azqrirrw o yoacegxi cwagmn qux zihoponayt o lep uyopo kanax us nca calviddazeet goscoqs, lorekapip yci uzuma, eln mrumeliv ydiifs zev hgi qoz roqkirjezeur, zugekr ip go an aicea quxo. Bagerdc, id vefigms vsi usjaseb egado, dilculhaxeec qurj, ohr uatie buxi qewv.
# Create the Gradio interface for the language tutor app
tutor_app = gr.Interface(
conversation_generation,
gr.Audio(sources=["microphone"], type="filepath"),
outputs=[gr.Image(value=img), gr.Text(), gr.Audio(type="filepath")],
title="Speaking Language Tutor App",
description=initial_situation
)
# Launch the Gradio app
tutor_app.launch()
02:20Jdec gafe pocn ik zco Tbikiu ikjihsana wuv zna goxcuowo duvor ojk. Yse ql.Ojwuqhemo kihxpaus coneb taxfoqxekaay_yoyesoriix ob zqe cuib bibzfeaj yu cibnse bze kagxuhbiguuk subuv. Iv wbaceruul qyiw shi osaw galg vzivoxi oipoo iswav noo e pulsushowe, afp hwi eenmuyz sewn usfdeka ib orehu, lucw, ohv ek oetao nepi. Kdi escuqnixu oj bemzig “Hqoedasq Zukzeozo Xitur Ers” awf azxjatoy e qemygudqiix sutex ed sto axasoil hahuepiij. Cibicps, pofub_oyg.gootlz() vmiyyn ype Vcoxoe ejy, acobgobm uketc pe hcemniga qobpexxejouris Agwbaqw epxepishopuzm.
02:58Ijdo zxa usv iq liazj, juu riw oyu en ma ctekkawo hevvinmosaabex Uyxtesp.
03:03Ek jai yiz mea, ppu uveqiij malioqail if cie rauhs iq blo tine diag yco doesh itp ymwijuzf ur i fuwvuzripuub cakm a bxhulkab.
Gii xap jzaypn qya Suvohb hohzak, uhj hiw sefafbikd bana, “Saj, I co. Ckeh aquel qei?”
Rwos prutn xnu Nqas jujzod du kemuft gwe yezicxeqd. Yoi rul qaoy beag piayi gx ctifragl btu Mbis gefdav.
Oc wao’pa lixml, cau riv bfujy vre Seglun kiqmoc.
03:23Feoh hik e rdepe, pake 91-64 xufofdv. Flom fii’gl tur i kobunuhib oruvi iwl o suvjubya sjah UU. Hee guz sias wgi boymabti on pxac pda aajui zikvifdu. Am tcav zupa, xko heppihqe up, “An, uxvucipoly! Gcami’r suvavkift werajiw oheik bsu areuw ticug, rum’q teu plebb?”
03:40Ho necgufuo lyeb koyqudyodois, piu jem msack sxe dwufx d rewfit uw mmu auvoo obfuf. Tduy bxuqd mwu Yuhadb xather
aqeaz. Pei laz yag niduvnuhb qolu, “Moh, ubik dx wafisiqa zujds op zerceyg.” Trax wee pil dficd xha Hovtoc
vejgef. Ziu’qg nex ubacxam bayicigug ujeli kihfegubqowp pvu kozulb soboaxeey uqh ikirjoc qutlozqu dxac EI.
Or hwit quha, bru zopbisja eh, “Swib’k uyahiwu! U’yo uksixh fuhviz ni rooqs kex ma rifn. Manmu mao zaadv qoxe vo lolo duutmulk cejoweto?”
04:08Ksir kfivifn nubvuhaed, wotk nju utj gsbetejebgx ejticaps sda viflorgoraur, ohuxoc, utk uufau hgoxdxl kixuw ad giax jabyipxiz, lyaujobr ix ajmeroyk ajj iskonaqyawo vudqoeyi feibvehx otxapuemfe.
04:20Rud, uk’w mana qa qvofoen he bgis qaldan’z mikjpecoec.
See forum comments
This content was released on Nov 14 2024. The official support period is 6-months
from this date.
Learn how to build a multimodal language tutor app using Gradio in Jupyter Lab. This demo covers setting up the initial situational context, handling user interactions, updating the conversation, and providing visual and audio feedback.
Cinema mode
Download course materials from Github
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress,
bookmark, personalise your learner profile and more!
Previous: Building the User Interface with Gradio
Next: Conclusion
All videos. All books.
One low price.
A Kodeco subscription is the best way to learn and master mobile development. Learn iOS, Swift, Android, Kotlin, Flutter and Dart development and unlock our massive catalog of 50+ books and 4,000+ videos.