regular expression exact date with multiple formats

asked 2020-05-05T10:31:21.75+00:00
dani shamir 81 Reputation points

I have a large file with URL strings such as:

http://tg24.sky.it/mondo/2020/05/01/corea-nord-kim-riappare.html

http://tg24.sky.it/mondo/01/05/2020/corea-nord-kim-riappare.html
http://tg24.sky.it/mondo/2020/04/30/corea-nord-kim-riappare.html

http://tg24.sky.it/mondo/04/30/2020/corea-nord-kim-riappare.html

I need to extract only the URLs with date 01-05-2020 in any format it arrives, with or without separators.

so I have written the following regexp:

^./?0?(1|5|(?:20)?20)[/-]0?(1|5|(?:20)?20)[/-]0?(1|5|(?:20)?20)/?.$

it works fine, but also finds false positives such as:

XXXX/5/5/5/YYYYY

So I understand that I need to enhance it in a way - that if the first pattern is MM, then look in the second for DD or YYYY, and then in the third only look for what is left.

An thoughts of how to do it ?

Thanks,

Dani

Azure Active Directory Domain Services
No comments
{count} votes

1 answer

Sort by: Most helpful
  1. answered 2020-05-29T18:42:51.003+00:00
    Saurabh Sharma 17,291 Reputation points Microsoft Employee

    Hi,

    Q&A currently supports the products listed over here https://learn.microsoft.com/en-us/answers/products (more to be added later on).

    You might want to reach out to the experts over StackOverflow.

    (Please don't forget to accept helpful replies as answer)

    No comments