다음을 통해 공유


DAX: Dates Prior to 1900

https://msdnshared.blob.core.windows.net/media/2016/08/7827.NinjaAwardTinyBronze.pngBronze Award Winner


Scenario

In attempting to analyze such data as the entire history of The Ohio State University's football program (colloquially "The Buckeyes"), challenges arise because part of the program's history pre-dates January 1st, 1900. As a rule, DAX handles dates prior to this arbitrary date poorly. Thus, attempting to use Power BI to do something as seemingly trivial as provide a multi-line card or table visualization, such as the one show below, containing the start and end years for each coach becomes challenging.

http://community.powerbi.com/t5/image/serverpage/image-id/1716i9FF0012A212890E5/image-size/large?v=mpbl-1&px=600

Setup

To setup the scenario, we start with two CSV files, one for season data and the other for coaching data.

osu-football-seasons.csv (sample data)

Season,Wins,Losses,Ties,Coach,Conference,Bowl Game,Highest Rank,Final Rank
2014,14,1,,Urban Meyer,Big Ten,Sugar Bowl,,
2013,12,2,,Urban Meyer,Big Ten,Orange Bowl,,
2012,12,0,,Urban Meyer,Big Ten,,,
2011,6,7,,Luke Fickell,Big Ten,Gator Bowl,,
2010,12,1,,Jim Tressel,Big Ten,Sugar Bowl,,
1919,6,1,,John Wilce,Ohio Athletic,,, 
1918,3,3,,John Wilce,Ohio Athletic,,, 
1917,8,0,1,John Wilce,Ohio Athletic,,, 
1916,7,0,,John Wilce,Ohio Athletic,,, 
1915,5,1,1,John Wilce,Ohio Athletic,,, 
1914,5,2,,John Wilce,Ohio Athletic,,, 
1913,4,2,1,John Wilce,Ohio Athletic,,, 
1912,6,3,,John Richards,Ohio Athletic,,, 
1911,5,3,2,Harry Vaughn,Ohio Athletic,,, 
1910,6,1,3,Howard Jones,Ohio Athletic,,, 
1909,7,3,,Albert Herrnstein,Ohio Athletic,,, 
1908,6,4,,Albert Herrnstein,Ohio Athletic,,, 
1907,7,2,1,Albert Herrnstein,Ohio Athletic,,, 
1906,8,1,,Albert Herrnstein,Ohio Athletic,,, 
1905,8,2,2,ER Sweetland,Ohio Athletic,,, 
1904,6,5,,ER Sweetland,Ohio Athletic",,, 
1903,8,3,,Perry Hale,Ohio Athletic,,, 
1902,6,2,2,Perry Hale,Ohio Athletic,,, 
1901,5,3,1,John Eckstorm,Independent,,, 
1900,8,1,1,John Eckstorm,Independent,,, 
1899,9,0,1,John Eckstorm,Independent,,, 
1898,3,5,,Jack Ryder,Independent,,, 
1897,1,8,,David Edwards,Independent,,, 
1896,5,5,1,Charles Hickey,Independent,,, 
1895,4,4,2,Jack Ryder,Independent,,, 
1894,5,6,,Jack Ryder,Independent,,, 
1893,4,5,,Jack Ryder,Independent,,, 
1892,5,3,,Jack Ryder,Independent,,, 
1891,2,2,,Alexander Lilley,Independent,,,
1890,1,3,,Alexander Lilley,Independent,,,

coaches.csv

Coach,Image
Urban Meyer,https://upload.wikimedia.org/wikipedia/commons/3/3​1/Urban_Meyer_vs_Michigan%2C_2013.jpg
Luke Fickell,https://upload.wikimedia.org/wikipedia/commons/b/b​5/Luke_Fickell_in_2014.jpg
Jim Tressel,https://upload.wikimedia.org/wikipedia/en/a/af/Coa​ch_Tressel.png
John Cooper,https://upload.wikimedia.org/wikipedia/commons/d/d​4/John_Cooper_at_The_Game_in_2014.jpg
Earle Bruce,
Woody Hayes,https://upload.wikimedia.org/wikipedia/en/d/d3/Woo​dy_Hayes.jpg
Wes Fesler,https://upload.wikimedia.org/wikipedia/commons/2/2​e/Wesley_Fesler.jpg
Paul Bixler,
Carroll Widdoes,https://upload.wikimedia.org/wikipedia/en/a/ac/Car​roll_Widdoes.jpg
Paul Brown,https://upload.wikimedia.org/wikipedia/en/6/6c/Bro​wns_coach_Paul_Brown_with_players%2C_1952.jpg
Francis Schmidt,https://upload.wikimedia.org/wikipedia/en/8/85/Fra​ncis_Schmidt.jpg
Sam Willaman,https://upload.wikimedia.org/wikipedia/en/a/a2/Sam​_Willaman.jpg
John Wilce,https://upload.wikimedia.org/wikipedia/en/b/b3/Joh​n_Wilce.jpg
John Richards,https://en.wikipedia.org/wiki/John_R._Richards#/me​dia/File:John_R._Richards.png
Harry Vaughn,https://upload.wikimedia.org/wikipedia/commons/e/e​7/Vaughan-harry.jpg
Howard Jones,https://en.wikipedia.org/wiki/Howard_Jones_(Americ​an_football)#/media/File:Howard_H._Jones.png
Albert Herrnstein,https://upload.wikimedia.org/wikipedia/en/f/f0/Alb​ert_Herrnstein.jpg
ER Sweetland,https://upload.wikimedia.org/wikipedia/en/4/48/Swe​etland_from_1898_Cornell_varsity_crew.jpg
Perry Hale,https://upload.wikimedia.org/wikipedia/commons/e/e​a/Perry_Hale.png
John Eckstorm,https://upload.wikimedia.org/wikipedia/commons/1/1​4/Dr._JBC_Eckstorm.png
Jack Ryder,https://upload.wikimedia.org/wikipedia/commons/c/c​3/Jack_Ryder_OSU.jpg
David Edwards,https://upload.wikimedia.org/wikipedia/commons/0/0​d/David_Farragut_Edwards.jpg
Charles Hickey,https://upload.wikimedia.org/wikipedia/commons/9/9​5/CAHickey.png
Alexander Lilley,https://upload.wikimedia.org/wikipedia/commons/2/2​c/Alexander_S_Lilley.jpg

After importing the tables into the data model, create a relationship between the “Coach” columns.

http://community.powerbi.com/t5/image/serverpage/image-id/1717i24991EDDCC2DC297/image-size/large?v=mpbl-1&px=600 

Challenges

The stated goal is to create a visualization that contains the first and last years that a particular individual was the head coach of The Buckeyes. DAX has two seemingly very convenient functions for this, FIRSTDATE and LASTDATE. However, both of these require a column containing date values as an input parameter. DAX also has a function for this called DATE. The DATE function takes three numeric input parameters, Year, Month and Day. Thus, we can theorize that we can create a date column to be used with FIRSTDATE and LASTDATE by creating a new column with the following formula in osu-football-seasons:

  • Date = DATE([Year],1,1)

This formula should return a January 1st date for each corresponding year of the football season contained in the osu-football-seasons table. However, a closer look through the data of this column reveals dates such as 1/1/3799, 1/1/3798, 1/1/3797, etc. Since one can be reasonably certain that analyzing historical data would not include dates nearly 2,000 years in the future, something is definitely amiss. An even closer look shows that these odd dates occur only for years prior to 1900. A little research on the DAX DATE function reveals that this behavior is “by design”. The DATE function reference, https://support.office.com/en-US/article/DATE-Func tion-DAX-1d2cde95-a787-4e93-adea-d638389a0494, clearly states:

If the year value is between 0 and 1899, the value is added to 1900 to produce the final value. See the examples below.”

and as an example provides:

Example: Years before 1899

Description

If the value that you enter for the year argument is between 0 (zero) and 1899 (inclusive), that value is added to 1900 to calculate the year. The following formula returns January 2, 3700: (1900+1800).

Code

=DATE(1800,1,2)

Interestingly, this seemingly arbitrary date of 1/1/1900 also happens to be the base date for Microsoft SQL Server. Microsoft SQL Server uses 8 bytes to store a DateTime value—the first 4 for the date and the second 4 for the time. For the date portion, the value SQL Server stores are the number of days before or after the base date of January 1, 1900. Therefore, a value of 0 is January 1, 1900 while a positive number represents dates after January 1, 1900 and a negative number represents dates prior to January 1, 1900.

Seeking another solution, there is a DATEVALUE function in DAX whose documentation makes no mention of the “prior to 1900” issue. The DATEVALUE function takes a single text parameter that is a text representation of the date and returns a date in date/time format. Thus, we can create two additional columns:

  • DateText = "1/1/" & [Season]
  • DateValue = DATEVALUE([DateText])

Unfortunately, the entire contents of the DateValue column end up being “Invalid date”. Further investigation reveals that removing the pre-1900 dates gets rid of this error and returns the correct date/time values for the remaining seasons.

Thus, we can start to conclude that the issue with handling pre-1900 dates, while undocumented, extends to other functions within DAX and quite possibly throughout the entire language.

Solution

To resolve the challenges above, one can use power query’s “M” language, which does not seem to suffer from the same pre-1900 date challenges as the DAX language. We can modify the power query to add a step to concatenate "1/1/" and [Year] column and then another step to set the resulting column to a date/time type. The column name and formula is:

  • SeasonDate = "1/1/" & Text.From([Season])

The additional two steps concatenate a string and convert it to a date field. We can see in the query preview that the correct years are preserved. Once imported into the data model, we can also see that the correct years are preserved for the SeasonDate column. Thus, we can conclude that the data model itself also does not suffer from the pre-1900 date challenges and can store dates prior to 1900 successfully.

We can now use this column as the input parameter to FIRSTDATE and LASTDATE. Creating two new columns in the coaches table:

  • First Year Coached = YEAR(FIRSTDATE('osu-football-seasons'[SeasonDate]) )
  • Last Year Coached = YEAR(LASTDATE('osu-football-seasons'[SeasonDate]))

Unfortunately, we notice that we are back to having years in the 3700’s. Further investigation reveals that the FIRSTDATE and LASTDATE functions are returning behaving exactly as the DATE function. Interestingly though, creating a column in osu-football-seasons such as:

  • Year = YEAR([SeasonDate])

We see that the YEAR function does NOT suffer from the issue with pre-1900 dates and returns the correct years for the dates in the 1800’s.

To resolve this issue, we can modify the “First Year Coached” and “Last Year Coached” column formulas to:

  • First Year Coached = IF(YEAR(FIRSTDATE('osu-football-seasons'[SeasonDat e]))>3000,YEAR(FIRSTDATE('osu-football-seasons'[Se asonDate]))-1900,YEAR(FIRSTDATE('osu-football-seas ons'[SeasonDate])))
  • Last Year Coached = IF(YEAR(LASTDATE('osu-football-seasons'[SeasonDate ]))>3000,YEAR(LASTDATE('osu-football-seasons'[Seas onDate]))-1900,YEAR(LASTDATE('osu-football-seasons '[SeasonDate])))

Interestingly, the above formulas also work with our original [Date] field in place of [SeasonDate].

Finally, we can create our final column in the coaches table for use in our visualizations:

  • Years = [First Year] & " to " & [Last Year]

An alternative solution would be to use standard date/time subtraction, which subtracts days from a date/time value. Since DAX pre-1900 math adds 1900 to the year, in theory, if we subtract 1900 multiplied by the number of days in the year we can also resolve the issue. An example of this would be:

  • Days = IF(YEAR(FIRSTDATE('osu-football-seasons'[SeasonDat e]))>3000,FIRSTDATE('osu-football-seasons'[SeasonD ate])-(1900*365.243),FIRSTDATE('osu-football-seaso ns'[SeasonDate]))

Conclusion

Dealing with pre-1900 dates with DAX can be challenging. The issues extend well beyond the DATE function, where the issues are documented, to other date and time functions such as DATEVALUE, FIRSTDATE, and LASTDATE but not to all date and time functions. For example, the YEAR function does not suffer from the pre-1900 date issues. The Power BI data model does not suffer from similar issues and can store dates prior to 1900 without modification.

Because the power query “M” language and the Power BI data model handle pre-1900 dates without issues, the problems with pre-1900 dates in DAX can be worked around through creative formula construction.