Splitting age text using regex

Mark Wingfield 21 Reputation points
2021-05-04T22:20:02.3+00:00

I work with historical burial records and want to work with Ages in these documents for calculating rough birth dates. This will be within vba code.

Sometimes, these are not simply an integer but are strings showing years, months, weeks and/or days.

An example using all of these, with any spaces removed, is below.

11years3months2weeks3days

The years/months/weeks/days texts could be a list of abbreviations such as mths/m, wks/w etc.

If, as in the example, more than one of these time periods appears, they will be in that order (largest to smallest time period) and the numeric values should make sense ie up to 120 at most for years, 1-23 for months(children can have ages such as 18 months), probably up to 51 weeks, and probably up to 365 days.

However, these are all optional so there could be exams such as below:

1mth3dys
2years6months
3wks5days

I am sure regex can check these formats. I have had some ideas but am not experienced in regex.

Can a regex expression be used to, not just identify tye format but to get the numeric values for any years/months/weeks/days in the text? If so, what code could be used in a vba function to do this?

Cheers

Developer technologies | Visual Basic for Applications
0 comments No comments
{count} votes

Accepted answer
  1. Viorel 122.6K Reputation points
    2021-05-05T07:22:01.47+00:00

    If you are interested, here is another approach that uses a single expression:

    Dim example As String
    example = "11years3months2weeks4days"
    
    Dim re As New RegExp
    re.Pattern = "^\s*((\d+)\s*(year|years|yrs|y))?\s*((\d+)\s*(month|months|mths|m))?\s*((\d+)\s*(week|weeks|w))?\s*((\d+)\s*(day|days|d))?\s*$"
    re.IgnoreCase = True
    
    Dim mc As MatchCollection
    Set mc = re.Execute(example)
    
    If mc.Count = 0 Then
    
        MsgBox "Input not recognised"
    
    Else
        Dim f As Match
        Set f = mc(0)
    
        Dim y As Integer
        Dim m As Integer
        Dim w As Integer
        Dim d As Integer
    
        y = CInt(Val(f.SubMatches(1)))
        m = CInt(Val(f.SubMatches(4)))
        w = CInt(Val(f.SubMatches(7)))
        d = CInt(Val(f.SubMatches(10)))
    
        MsgBox "Years: " & y & ", month: " & m & ", weeks: " & w & ", days: " & d
    
    End If
    

    Adjust the pattern if you find more forms of inputs.

    1 person found this answer helpful.
    0 comments No comments

3 additional answers

Sort by: Most helpful
  1. Tom van Stiphout 1,861 Reputation points MVP Volunteer Moderator
    2021-05-05T02:36:54.34+00:00

    I doubt it. Regex is about REGULAR expressions, and these seem very unregular (e.g. months/mth). You may be able to shoe-horn it in, but your expression would be ginormous and impossible to maintain. Hence: bad code.
    I would go for simpler string parsing, perhaps aided by a table of translations: (e.g. month > m, mth > m)

    0 comments No comments

  2. Viorel 122.6K Reputation points
    2021-05-05T06:16:24.023+00:00

    For example, to extract the years, try a code like this:

    Dim example As String
    example = "2years6months"
    
    Dim re As New RegExp
    re.Pattern = "(\d+)(years|year|yrs)"
    re.IgnoreCase = True
    
    Dim ms As MatchCollection
    Set ms = re.Execute(example)
    
    Dim n As Integer
    n = 0
    
    If ms.Count <> 0 Then
    
        Dim m As Match
        Set m = ms(0)
    
        n = CInt(m.SubMatches(0))
    
        MsgBox "Number: " & n
    
    Else
    
        MsgBox "Not found"
    
    End If
    

    It also includes some MsgBox for debugging. To use RegEx in VBA, add a reference to “Microsoft VBScript Regular Expression 5.5” from Tools menu.

    Adjust the pattern to extract other components. To avoid code repetition, you can create a common function, then “years|year|yrs” will be sent as a parameter.

    Theoretically it is possible to write a single complex regular expression to extract all of the numbers, but the above approach is probably simpler.

    However, if you also want to detect the bad inputs, you can write a single regular expression that includes all of the possible components.

    0 comments No comments

  3. Mark Wingfield 21 Reputation points
    2021-05-08T19:51:56.027+00:00

    Thanks Viorel-1.

    That second script is perfect. I now have some new knowledge of SubMatches which I did not know before.

    Cheers

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.