pair_probabilities_fl()

Artikel
01/18/2024

Bereken verschillende waarschijnlijkheden en gerelateerde metrische gegevens voor een paar categorische variabelen.

De functie pair_probabilities_fl() is een UDF (door de gebruiker gedefinieerde functie) waarmee de volgende waarschijnlijkheden en gerelateerde metrische gegevens voor een paar categorische variabelen, A en B, als volgt worden berekend:

P(A) is de kans van elke waarde A=a
P(B) is de waarschijnlijkheid van elke waarde B=b
P(A|B) is de voorwaardelijke waarschijnlijkheid van A=a gegeven B=b
P(B|A) is de voorwaardelijke waarschijnlijkheid van B=b gegeven A=a
P(A∪B) is de samenvoegingskans (A=a of B=b)
P(A∩B) is de snijpuntkans (A=a en B=b)
De liftwaarde wordt berekend als P(A∩B)/P(A)*P(B). Zie lift metric (Lift metric) voor meer informatie.
- Een lift in de buurt van 1 betekent dat de gezamenlijke waarschijnlijkheid van twee waarden vergelijkbaar is met wat wordt verwacht in het geval dat beide variabelen onafhankelijk zijn.
- Lift >> 1 betekent dat waarden vaker voorkomen dan verwacht onder de onafhankelijkheidsveronderstelling.
- Lift << 1 betekent dat waarden minder waarschijnlijk samenvallen dan verwacht onder de veronderstelling van onafhankelijkheid.
De Jaccard-gelijkeniscoëfficiënt wordt berekend als P(A∩B)/P(A∪B). Zie Jaccard-gelijkeniscoëfficiënt voor meer informatie.
- Een hoge Jaccard-coëfficiënt, dicht bij 1, betekent dat de waarden vaak samen voorkomen.
- Een lage Jaccard-coëfficiënt, dicht bij 0, betekent dat de waarden vaak uit elkaar blijven.

Syntax

pair_probabilities_fl(A, B, Bereik)

Meer informatie over syntaxisconventies.

Parameters

Naam	Type	Vereist	Beschrijving
A	Scalaire	✔️	De eerste categorische variabele.
B	Scalaire	✔️	De tweede categorische variabele.
Scope	Scalaire	✔️	Het veld dat het bereik bevat, zodat de waarschijnlijkheden voor A en B onafhankelijk worden berekend voor elke bereikwaarde.

Functiedefinitie

U kunt de functie definiëren door de code in te sluiten als een door een query gedefinieerde functie of door deze als volgt te maken als een opgeslagen functie in uw database:

Query's gedefinieerd
Opgeslagen

Definieer de functie met behulp van de volgende let-instructie. Er zijn geen machtigingen vereist.

Belangrijk

Een let-instructie kan niet zelfstandig worden uitgevoerd. Deze moet worden gevolgd door een tabellaire expressie-instructie. Zie Voorbeeld als u een werkend voorbeeld van pair_probabilities_fl()wilt uitvoeren.

let pair_probabilities_fl = (tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA  = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB  = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope           // probability for each value of A
| join kind = leftouter (probB) on _B, _scope           // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB                       // union probability
       , P_AIB = P_AB/P_B                               // conditional probability of A on B
       , P_BIA = P_AB/P_A                               // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B)                     // lift metric
       , Jaccard_AB = P_AB/P_AUB                        // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
};
// Write your query to use the function here.

Definieer de opgeslagen functie eenmaal met behulp van de volgende .create function. Machtigingen voor databasegebruikers zijn vereist.

Belangrijk

U moet deze code uitvoeren om de functie te maken voordat u de functie kunt gebruiken, zoals wordt weergegeven in het voorbeeld.

.create-or-alter function with (folder = "Packages\\Stats", docstring = "Calculate probabilities and related metrics for a pair of categorical variables")
pair_probabilities_fl(tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA  = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB  = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope           // probability for each value of A
| join kind = leftouter (probB) on _B, _scope           // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB                       // union probability
       , P_AIB = P_AB/P_B                               // conditional probability of A on B
       , P_BIA = P_AB/P_A                               // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B)                     // lift metric
       , Jaccard_AB = P_AB/P_AUB                        // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
}

Voorbeeld

In het volgende voorbeeld wordt de operator aanroepen gebruikt om de functie uit te voeren.

Query's gedefinieerd
Opgeslagen

Als u een door een query gedefinieerde functie wilt gebruiken, roept u deze aan na de definitie van de ingesloten functie.

De query uitvoeren

let pair_probabilities_fl = (tbl:(*), A_col:string, B_col:string, scope_col:string)
{
let T = materialize(tbl | extend _A = column_ifexists(A_col, ''), _B = column_ifexists(B_col, ''), _scope = column_ifexists(scope_col, ''));
let countOnScope = T | summarize countAllOnScope = count() by _scope;
let probAB = T | summarize countAB = count() by _A, _B, _scope | join kind = leftouter (countOnScope) on _scope | extend P_AB = todouble(countAB)/countAllOnScope;
let probA  = probAB | summarize countA = sum(countAB), countAllOnScope = max(countAllOnScope) by _A, _scope | extend P_A = todouble(countA)/countAllOnScope;
let probB  = probAB | summarize countB = sum(countAB), countAllOnScope = max(countAllOnScope) by _B, _scope | extend P_B = todouble(countB)/countAllOnScope;
probAB
| join kind = leftouter (probA) on _A, _scope           // probability for each value of A
| join kind = leftouter (probB) on _B, _scope           // probability for each value of B
| extend P_AUB = P_A + P_B - P_AB                       // union probability
       , P_AIB = P_AB/P_B                               // conditional probability of A on B
       , P_BIA = P_AB/P_A                               // conditional probability of B on A
| extend Lift_AB = P_AB/(P_A * P_B)                     // lift metric
       , Jaccard_AB = P_AB/P_AUB                        // Jaccard similarity index
| project _A, _B, _scope, bin(P_A, 0.00001), bin(P_B, 0.00001), bin(P_AB, 0.00001), bin(P_AUB, 0.00001), bin(P_AIB, 0.00001)
, bin(P_BIA, 0.00001), bin(Lift_AB, 0.00001), bin(Jaccard_AB, 0.00001)
| sort by _scope, _A, _B
};
//
let dancePairs = datatable(boy:string, girl:string, dance_class:string)[
    'James',   'Mary',      'Modern',
    'James',   'Mary',      'Modern',
    'Robert',  'Mary',      'Modern',
    'Robert',  'Mary',      'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Patricia',  'Modern',
    'Robert',  'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Linda',     'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Linda',     'Modern',
    'Robert',  'Linda',     'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Mary',      'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Mary',      'Classic',
    'Robert',  'Linda',     'Classic',
    'James',   'Mary',      'Classic',
    'James',   'Linda',     'Classic'
];
dancePairs
| invoke pair_probabilities_fl('boy','girl', 'dance_class')

Belangrijk

Als u dit voorbeeld wilt uitvoeren, moet u eerst de functiedefinitiecode uitvoeren om de functie op te slaan.

let dancePairs = datatable(boy:string, girl:string, dance_class:string)[
    'James',   'Mary',      'Modern',
    'James',   'Mary',      'Modern',
    'Robert',  'Mary',      'Modern',
    'Robert',  'Mary',      'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Patricia',  'Modern',
    'Robert',  'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Linda',     'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Linda',     'Modern',
    'Robert',  'Linda',     'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Mary',      'Modern',
    'Michael', 'Patricia',  'Modern',
    'Michael', 'Patricia',  'Modern',
    'James',   'Linda',     'Modern',
    'Robert',  'Mary',      'Classic',
    'Robert',  'Linda',     'Classic',
    'James',   'Mary',      'Classic',
    'James',   'Linda',     'Classic'
];
dancePairs
| invoke pair_probabilities_fl('boy','girl', 'dance_class')

Uitvoer

Laten we eens kijken naar de lijst met paren van mensen die tijdens twee danslessen zogenaamd willekeurig dansen om erachter te komen of er iets afwijkend uitziet (wat betekent, niet willekeurig). We beginnen met het bekijken van elke klas op zichzelf.

Het Michael-Patricia paar heeft een liftmetriek van 2,375, wat aanzienlijk hoger is dan 1. Deze waarde betekent dat ze veel vaker samen worden gezien dan wat er zou worden verwacht als deze koppeling willekeurig was. Hun Jaccard-coëfficiënt is 0,75, wat dicht bij 1 ligt. Als het paar danst, dansen ze liever samen.

A	B	scope	P_A	P_B	P_AB	P_AUB	P_AIB	P_BIA	Lift_AB	Jaccard_AB
Robert	Patricia	Modern	0.31578	0.42105	0.05263	0.68421	0.12499	0.16666	0.39583	0.07692
Robert	Mary	Modern	0.31578	0.26315	0.15789	0.42105	0.59999	0.49999	1.89999	0.37499
Robert	Linda	Modern	0.31578	0.31578	0.10526	0.52631	0.33333	0.33333	1.05555	0,2
Michael	Patricia	Modern	0.31578	0.42105	0.31578	0.42105	0,75	0.99999	2.375	0,75
James	Patricia	Modern	0.36842	0.42105	0.05263	0.73684	0.12499	0.14285	0.33928	0.07142
James	Mary	Modern	0.36842	0.26315	0.10526	0.52631	0,4	0.28571	1.08571	0,2
James	Linda	Modern	0.36842	0.31578	0.21052	0.47368	0.66666	0.57142	1.80952	0.44444
Robert	Mary	Klassiek	0.49999	0.49999	0.24999	0,75	0.49999	0.49999	0.99999	0.33333
Robert	Linda	Klassiek	0.49999	0.49999	0.24999	0,75	0.49999	0.49999	0.99999	0.33333
James	Mary	Klassiek	0.49999	0.49999	0.24999	0,75	0.49999	0.49999	0.99999	0.33333
James	Linda	Klassiek	0.49999	0.49999	0.24999	0,75	0.49999	0.49999	0.99999	0.33333

Delen via

pair_probabilities_fl()

Syntax

Parameters

Functiedefinitie

Voorbeeld

Feedback

Feedback

Aanvullende resources