Summary: I built an Arabic dictionary that you can look up words in even if you don't know how to spell them in Arabic. The dictionary is here and the code is here. Below, I discuss my motivations for building the dictionary and the tools I used to put it together.
Arabic is hard!
Arabic is a difficult language to learn for English speakers. The Foreign Service Institute classifies it as a Category V language alongside Chinese, Japanese and Korean, requiring 88 weeks or 2200 hours of class instruction to achieve fluency -- compare this with Category I languages like Spanish, which take 24 weeks or 600 hours. When I was studying Arabic in college, the course was worth six credits rather than the usual four to "compensate" us for getting less far for the same amount of study than if we'd taken most other languages.
One of the major difficulties beginning learners face is mastering the writing system. It goes right-to-left. Letters change shape according to their surroundings. For example, these are all the same letter:
Even more challenging, a large number of the sounds that these letters
represent are unfamiliar, and easily confused with other sounds. Here's ق /q/, the voiceless uvular stop, which to an English speaker
might be an odd-sounding /k/:
Source: Wikipedia
The pharyngeal fricative ع /ʕ/ sounds like an /a/ with a sore throat:
Source: Wikipedia
The "emphatic" consonants such as ط /tˤ/ are pronounced with a
secondary constriction of the pharynx, which sounds like a normal /t/
followed by a backer, rounder, lower vowel; there's a short vs long vowel
distinction that English lacks...the list goes on. This was actually the first webapp I ever built, probably around 2007, using
Perl's CGI module. When a user entered a word in English transliteration such as
<kitab>, I'd match each letter with possible variants, for example <k>
could match /k/ or /q/, <i> could match a long /i:/ or a short vowel /i/,
<t> could match /t/ or /tˤ/, etc. I'd take all the possible combinations and pass them
through the free Buckwalter
Arabic Morphological Analyzer, which consists of a Perl script with several
dictionary files, giving glosses and part-of-speech information. It worked pretty well for my very first webapp, and was decently fast,
but it was searching through a lot of impossible words, and could wind up
pretty far away from where the user's input started. I've now replaced this
part of the logic with a web service from
Yamli, which develops tools to help Arabic users
access the internet more easily. One of these is a
smart Arabic keyboard,
which lets you type words in English transliteration or
Arabic chat
alphabet and instantly suggests Arabic words they might correspond to.
You can try it in the Yamlified textbox below.
At the same time, I replaced the Perl/CGI script with a simple Flask app
deployed on Heroku. It takes Yamli's suggested words and sends them via
Ajax (JQuery) to a Python port of the Buckwalter Arabic Morphological Analyzer.
The code is
here.
It's not perfect: the Buckwalter Arabic Morphological Analyzer was updated
a couple of times after the free 1.0 version, but those versions
require a licence fee.
So there are a few errors in parts of speech, etc.
Also, there are times when Yamli
doesn't suggest a word I might expect, for example "ab" doesn't give me
August (آب). On the whole, though, I'm pretty pleased with the result.
Give it a try,
and if you have any comments, bug reports, or suggestions,
let me know below.Enter the Fuzzy Arabic Dictionary