Formula to sample from one table containing multiple populations

m8ksmelaf 21 Reputation points
2021-07-21T19:27:36.797+00:00

Hello - new to the forum and I apologize in advance if I'm not speaking VB/Excel properly and/or if a similar question has been asked/answered.

My scenario - I am provided with a table of data which consists of multiple separate populations. I am tasked with drawing a statistically significant random sample of each population within the table. So for example:

ColumnA | Column B | Column C
Smith | John | John Smith
Smith | Eric | Erick Smith
Smith | Jean | Jean Smith
Smith | Barbara | Barbara Smith
Johnson | Tony | Tony Johnson
Johnson | Gary | Gary Johnson
Johnson | Francis | Francis Johnson
Johnson | Tim | Tim Johnson
Johnson | Michelle | Michelle Johnson

And I require a random sampling of all "Smiths" and then from all "Johnsons." Ideally, my output will not be a random # assigned to each person, but rather it based on Column C, ie:

Results
Jean Smith
Tony Johnson
Francis Johnson

Thank you in advance

Excel Management
Excel Management
Excel: A family of Microsoft spreadsheet software with tools for analyzing, charting, and communicating data.Management: The act or process of organizing, handling, directing or controlling something.
1,716 questions
0 comments No comments
{count} votes

Accepted answer
  1. Lz._ 9,006 Reputation points
    2021-07-26T15:20:28.8+00:00

    @m8ksmelaf

    Excel workbook with a PQ solution avail. for download here. See the notes at the top of tblMonthlyTargets (the blue Table). I put in place a couple of things to secure your process

    117946-demo.png

    A brief description on how to add/remove rows from my tblMonthlyTargets as needed?
    Just add Last Names at the bottom of the existing table. One of the big advantages of Excel Tables => they auto-resize :)

    Ultimately I will need to employ this PQ on two different data sets, one containing 35 rows of targets, and one containing 24 rows of targets (and more importantly - if down the road if either population should happen to change, I could make the adjustments as needed)
    Same as above. Add as many rows as you need to tblFullNames and tblMonthlyTargets. On the other hand if i.e. a name should disappear from tblFullNames just clear the content of the row. Same thing with tblMonthlyTargets but clear the content of column [Last Name] only to preserve the formula in column [Count Unique]

    When you have tblFullNames and tblMonthlyTargets ready with your data click somewhere in the green table (MonthlySample) > Refresh - NB: the 1st Refresh within an Excel session takes more time than the next ones

    Any question/issue let me know and if this solves the issue please Accept as answer to help others with a similar scenario - Thanks

    1 person found this answer helpful.

5 additional answers

Sort by: Most helpful
  1. Herbert Seidenberg 1,191 Reputation points
    2021-07-22T01:08:00.627+00:00

    Excel 365 Pro Plus with Power Query.
    Pick one person from all the "Smiths" at random.
    No formulas, no VBA macro.
    https://www.mediafire.com/file/6cxj9rpd0lhhau5/07_21_21b.xlsx/file
    https://www.mediafire.com/file/21k34gor3ptml1m/07_21_21b.pdf/file


  2. Emily Hua-MSFT 27,646 Reputation points
    2021-07-22T07:09:23.41+00:00

    @m8ksmelaf

    Is there a requirement for the number of samples?
    I found for different data from Column A, we may use different formulas, according to required sample quantity, we can manually drop down to fill the cells.

    For sample from "Smith", please try the following formula.
    =INDEX($C$2:$C$10,RANDBETWEEN(1,COUNTIF($A$2:$A$10,$A$2)))
    116985-image.png

    For sample from "Johnson", you may try the formula below.
    =INDEX($C$2:$C$10,RANDBETWEEN(COUNTIF($A$2:$A$10,$A$2)+1,COUNTIF($A$2:$A$10,$A$2)+COUNTIF($A$2:$A$10,$A$10)))
    116986-image.png

    Please note, these results would be changed if you press F9.

    Any misunderstanding, you may post back.


    If an Answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  3. m8ksmelaf 21 Reputation points
    2021-07-23T14:00:04.873+00:00

    I presently have a formula that does my statistical calculations (MOE, Confidence, etc) of each population.

    I need to draw a sample from one table which consists of my monthly results, against a 2nd table which consists of how many of each sample I need. Keep in mind the "0" results belong to a list of names that are static, and that list of names will be checked every single month for potential sampling.

    I realize the complexity of my "ask" is greater than I originally stated at the start of this thread - my optimism that I could fill in the gaps got the better of me.

    Any guidance would be incredibly appreciated! Thank you all in advance!117449-sampling-from-two-tables.jpg


  4. Lz._ 9,006 Reputation points
    2021-07-26T08:32:28.367+00:00

    Hi @m8ksmelaf

    Following Comment from earlier today:
    Assuming I understood your challenge I'm very close to a full Power Query solution.
    However, I have a "little" challenge that I can't resolve right now. Before I spend more time on this I wanted to know if you have found a solution or if I should continue my investigations?
    Thanks

    Issue fixed. I now have a full Power Query solution (pic. truncated):

    117874-demo.png

    Let me know...


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.