Phone Number Word Games part 2 (by CalvinH)
As I said before in this blog, I have an optimized version of a program to solve a programming challenge
To recap, the challenge was to enumerate the possible phrases that can be obtained from a phone number, like 625-329-4741 is MakeAWish1
I understood the original challenge to be to solve the problem, without regard to efficiency. So, in a couple hours, I came up with a solution that wasn’t very efficient that I posted last week. Subsequently, I polished the optimized version below.
You can try this algorithm out by visiting this site and entering any phone number or string of letters. (Sorry, this site is internal to Microsoft only: I don’t want to bog down my tired old server, described in this blog)
It takes up to 15 seconds to calculate phrases based on a phone keypad and look them up in a 50,000 word dictionary.
For the original phone number of “642-394-6369” there were 20101 results calculated in about 10 seconds.
&& There are 3 parts to this algorithm.
&& First (GetDigSeqs) all the possible unique digit subsequences are enumerated into DigSeq.
&& For 123, this would be (123), (23), (3). For N digits, it would be maximum N sequences.
&& Part 2: (EnumNum) for each DigSeq, enumerate all possible letter words of length from 1 to len(DigSeq),
&& look them up in the dictionary, and place unique ones in SubWords
&& Part 3: (FormPhrase) For each digseq and, subword, assemble an N length phrase
&& EnumNum This part uses 2 parallel strings: the number to permute and a pattern
&& cPattern governs the pattern of letters to use for each digit's place
&& a "0" for each digits place to indicate which letter of the digit it is (0,1,2,3). 0 indicates use the raw digit
&& At the end, it'll be "1" (for 0,1), "3" (for 2,3,4,5,6,8), or "4" for "7,9", like "3334433143"
CLEAR
cNumber=INPUTBOX("Phone #?","Phone #","642-394-6369") && number of patterns is 4^8*5^2 = 1638400
dtStart=SECONDS()
GetPhrases(STRTRAN(cNumber,"-","")) && remove dashes
dtEnd=SECONDS()
?"#secs= ",dtEnd-dtStart,"num=",RECCOUNT("phrases")
#define MAXDIGITLEN 2 && max seq length of digits in final result
PROCEDURE GetPhrases(cNumber as String) && Given a digit sequence("6423946369") get all phrases
PUBLIC NUMDIGS,ox as dictionary.dict
NUMDIGS = LEN(cNumber)
ox=CREATEOBJECT('dictionary.dict') && Instantiate the dictionary COM object
ox.DictNum=2 && 1 = 171000 word dictionary, 2= 53869 word
CREATE CURSOR DigSeqs (DigSeq char(NUMDIGS+1)) && all Digit subsequences
INDEX ON DigSeq TAG DigSeq
CREATE CURSOR phrases (phrase char(30)) && a table into which we can put all resulting phrases
CREATE CURSOR SubWords (SubDigits char(NUMDIGS+1),SubWord char(NUMDIGS+1)) && All subwords found in dictionary
INDEX ON SubDigits TAG SubDigits
INDEX ON SubWord TAG SubWord
SET ORDER TO 1
SELECT SubWords
GetDigSeqs(cNumber)
FormPhrase(cNumber,"")
PROCEDURE GetDigSeqs(cNumber as String) && cNumber is all digis. Find the subsequences.
LOCAL i,fShorterFound
IF !SEEK(cNumber+' ',"DigSeqs") && if the number is not in the table
fShorterFound= SEEK(LEFT(cNumber,LEN(cNumber)-1)+' ') && Chop off 1 digit at the end and see if that's found
INSERT INTO DigSeqs VALUES (cNumber) && add the number into the table
IF !fShorterFound && if the shorter was not found,
EnumNum(cNumber,REPLICATE("0",LEN(cNumber)),1) && enumerate all words, look in dictionary
ENDIF
ENDIF
IF LEN(cNumber) >1 && If we need to recur
GetDigSeqs(SUBSTR(cNumber,2)) && Recur with the string without the first digit
ENDIF
RETURN
PROCEDURE EnumNum(cNumber as String,cPattern as String, nDigNum as Integer) && cNumber is raw string of digits with no separators
LOCAL i,cDigit,cPat,cStartLet,cLastPat,cNewLet
cDigit=SUBSTR(cNumber,nDigNum,1) && the digit we're working on
cPat=SUBSTR(cPattern,nDigNum,1) && where are we on this digit?
cStartLet=SUBSTR(" adgjmptw",ASC(cDigit)-47,1) && Starting letter for this digit: 0 1 2abc, 3def, 4ghi, 5jkl, 6mno, 7pqrs, 8tuv, 9wxyz
cLastpat =SUBSTR("0033333434",ASC(cDigit)-47,1) && # permutations -1 for this digit: 0 0 4, 4, 4, 4, 4, 5, 4, 5
DO WHILE .t.
IF !"0"$LEFT(cPattern,nDigNum)
IF !SEEK(LEFT(cNumber,nDigNum)+' ',"SubWords",2) and ox.isword(LEFT(cNumber,nDigNum))
INSERT INTO SubWords (SubDigits,SubWord) VALUES (LEFT(DigSeqs.DigSeq,nDigNum),LEFT(cNumber,nDigNum))
ENDIF
IF nDigNum < LEN(cNumber) && we haven't enumerated all digits yet
EnumNum(cNumber,cPattern,nDigNum+1) && recur with next digit
ENDIF
ENDIF
IF cPat=cLastPat && this digit has reached the end (for '7', we've done "7","p","q","r","s")
EXIT
ENDIF
cPat = CHR(ASC(cPat)+1) && increment the pattern
cPattern=LEFT(cPattern,nDigNum-1) + cPat + SUBSTR(cPattern, nDigNum+1) && insert the new letter into the pattern
cNewlet = CHR(ASC(cStartLet)+ASC(cPat)-49) && from 'a' to 'b' or from 'n' to 'o'
cNumber=LEFT(cNumber,nDigNum-1) + cNewLet + SUBSTR(cNumber, nDigNum+1) && insert the new letter into the number
ENDDO
RETURN
PROCEDURE FormPhrase(cNumber as String,cPartResult as String) && cNumber is digit only sequence
LOCAL i,nLen,nRec,cPartDigit
nLen = LEN(cNumber)
IF nLen = 0
INSERT INTO phrases VALUES (SUBSTR(cPartResult,2)) && if we've used all the digits, then we have a phrase. Remove the initial '-'
ELSE
FOR i = 1 TO nLen
cPartDigit=LEFT(cNumber,i) && the first i digits of the number
IF i <=MAXDIGITLEN AND LEN(cPartResult)>0 AND ISALPHA(RIGHT(cPartResult,1)) && don't insert 2 digit sequences adjacent
FormPhrase(SUBSTR(cNumber,i+1), cPartResult+'-'+ cPartDigit) && insert the digits as a part result
ENDIF
IF LEFT(cPartDigit,1)$"01" AND i <= MAXDIGITLEN && if it starts with '0' or '1', it's ok
FormPhrase(SUBSTR(cNumber,i+1), cPartResult+'-'+ cPartDigit)
ELSE
SEEK cPartDigit+' ' && now find any subword associated with this digit sequence
SCAN WHILE SubDigits=cPartDigit+' ' && for each subwords found (364 can be "dog" or "fog")
nRec=RECNO() && save/restore the current record
FormPhrase(SUBSTR(cNumber,i+1),cPartResult+'-'+ TRIM(SubWords.SubWord)) && recur with those subwords as partial results
GO nRec
ENDSCAN
ENDIF
ENDFOR
ENDIF
RETURN
End of program
Comments
- Anonymous
August 03, 2004
The link for me to try this out is broken. Would you tell me what it is?
mlprobka@imfpubs.com - Anonymous
August 03, 2004
I just posted a new blog entry with a new link
http://weblogs.asp.net/calvin_hsia/archive/2004/08/03/207286.aspx - Anonymous
September 27, 2007
Since July 4 th is nearing, I thought it would be appropriate to start my independent blog. In this first