Regular Expressions From Scratch, Part Four: The Kleene Closure of a Language

Článek
11/28/2005

Languages are sets, so we can take any two languages (over the same alphabet) and take their union to form a new language. Just as a reminder:

Definition 7: The union of two sets L and K is the set with all the members found in either, and is written L ∪ K.

We're going to take the Kleene Star one level further and say that it applies to languages too.

Definition 8: The Kleene Closure of a language L is L* = { e } ∪ L ∪ LL ∪ LLL …

Got it? L* is the language consisting of set of strings, where those strings are any number of strings in L concatenated together.

I want to extend our notion of concatenation in one more way. When I write a string or string variable next to a language, I mean "concatenate the language onto the end of the language that consists of only this string". So when I say abLa I mean {ab}L{a} Again, you can infer the braces from the context.

Let's do an example of using * and string concatenation of languages, just so its clear.

S = {0, 1}
L = {011*}
1L* = {1, 101, 1011, 101101, 1011011, …}

Or, in English, this is the set of strings w in S* such that w:

starts with a single 1,
has no more than one 0 in a row, and
ends with 1

We need one more thing to make this notation sufficiently terse. You probably noticed that the Kleene Star above only applied to the thing immediately to the star's left, whether the language or the symbol. Suppose we've got four languages: H, J, K and L. We can form two new languages through concatenation: G = HJ and F = KL. And then we can form a seventh language, D = G*F*. But we can't write D = HJ*KL*, because that's a different language. What we mean is D = (HJ)*(KL)*.

Therefore I'm declaring that we can use parenthesis in the normal way we're all used to for order of operations. We will figure out from context whether parens mean "this is a finite sequence" or "this is a grouping of concatenations".

Remember, we're looking for a way to concisely characterize languages, and clearly we're onto something here. We now have union, grouping, concatenation and Kleene closure. In other words, we have enough tools to define regular expressions, which we'll do next time. Stay tuned!

Comments

Anonymous
November 30, 2005
Just so you know we are out here!
Thanks for this
Anonymous
November 30, 2005
The comment has been removed
Anonymous
January 08, 2006
Hello Eric, I don't understand what you mean by "has no more than one 0 in a row". What row? Do you mean a member of 1L*? As the example shows that we can have more that one 0...
Anonymous
January 24, 2006
Shawn, I think he means that there is no more than one consecutive 0. So you can not have any strings that contain 00 or 000, but they could contain 1010, etc.
Anonymous
February 21, 2006
The comment has been removed
Anonymous
March 03, 2006
When it comes to validating input regular expression becomes a very important part of your security plan.&nbsp;...
Anonymous
November 25, 2009
Woldn't you need an aditional rule there for 1L* ? 10101 starts with single 1, has no more than one 0 in a row and ends in 1, but does not belong in 1L*. How about adding:

there are always two 1s between 0s ?

Anonymous
June 18, 2012
You are awesome, in the above topic you explained in a way every one could understand using few words! Perhaps i will call it best explanation of the topic!

Sdílet prostřednictvím

Regular Expressions From Scratch, Part Four: The Kleene Closure of a Language

Comments

Další materiály