- Questions & Answers
- Accounting
- Computer Science
- Automata or Computationing
- Computer Architecture
- Computer Graphics and Multimedia Applications
- Computer Network Security
- Data Structures
- Database Management System
- Design and Analysis of Algorithms
- Information Technology
- Linux Environment
- Networking
- Operating System
- Software Engineering
- Big Data
- Android
- iOS
- Matlab

- Economics
- Engineering
- Finance
- Thesis
- Management
- Science/Math
- Statistics
- Writing
- Dissertations
- Essays
- Programming
- Healthcare
- Law

- Log in | Sign up

ST 502 R project 3

For this project you will work in groups of 2. The project involves involves analyzing a data set using the

chi-square test for homogeneity, deriving this LRT, and conducting a Monte Carolo simulation to determine

properties of a similar test. You and your partner will create a final report to turn in.

Please make sure that your R file follows the guidelines on moodle. If these guidelines are not met, you will

lose credit. You can submit a .R or a .Rmd file for the code portion You should submit an HTML or PDF

file for the report portion.

Data Example

Consider three different hospitals. Each hospital has patients that end up with infections. Suppose we have

the following data:

Hospital Surgical Site Infections Pneumonia Infections Bloodstream Infections Total

A XXXXXXXXXX

B XXXXXXXXXX

C XXXXXXXXXX

Here we might consider that for a given hospital, we have a random sample with a fixed number of trials.

This implies that we have three separate multinomial distributions, one for each hospital. We might have an

interest in whether or not the multinomials are homogenous across the hospitals.

Use R to conduct a chi-square test for homogeneity using this data. You’ll need to manually create the table

(probably easiest to just use the matrix() function - leave off the total column). I’d like you to manually

calculate the LRT statistic, the Pearson Chi-square statistic, the critical value, and find approximate p-values

for the hypotheses using both test statistics (they will be very small).

Derivation

Ok, we’ve used the likelihood ratio test. Let’s derive it! Consider the generic case of comparing J independent

multinomial distributions, each with I categories.

Multinomial Cat 1 Cat 2 ... Cat I Sample size

1 π11 π21 ... πI1 n1

2 π12 π22 ... πI2 n2

... ... ... ... ... ...

J π1J π2J ... πIJ nJ

We want to test if

H0 : π11 = π12 = ... = π1J , π21 = π22 = ... = π2J , and , πI1 = πI2 = ... = πIJ

vs

HA : At least some probabilities differ

The likelihood here (in the general case) is just the product of the J multinomials.

Under the null hypothesis, π11 = π12 = ... = π1J so we can just replace this with a common π1. Similarly, we

can just consider having π1, ..., πI .

Derive the likelihood ratio test for the homogeneity test. Remember it should come out to be

LRT = 2

J∑

j=1

I∑

i=1

Obsij ln

(

Obsij

Expij

)

with approximate large sample distribution given by a χ2(I−1)(J−1) and expected cell counts given by n•jni•/n

where n is the total sample size.

1

Note: We did some of the details (like looking at the null max in the notes). You need to derive the test form

first and then reproduce other relevant parts.

Simulation

The Pearson statistic can be derived as a Taylor series approximation to the LRT. For the last part of the

project, we’ll investigate the α control of the Pearson chi-square test and its power (so we don’t have to worry

about the ln(0) that can sometimes pop up for the LRT).

Goal of simulation study:

• Determine how well the asymptotic rejection region performs at controlling α

• Determine the power of the asymptotic test when comparing certain alternative situations

Setup:

• Two multinomial case only, where each multinomial has three categorie

• All combinations of four sample sizes for each multinomial (16 total cases)

n1 = 20, 30, 50, 100

n2 = 20, 30, 50, 100

• Three different probabilities that may generate a particular multinomial:

p1 = 1/3, 1/3, 1/3 (equal)

p2 = 1/10, 3/10, 6/10 (mixed 1)

p3 = 1/10, 1/10, 8/10 (mixed 2)

• Use 50000 randomly created tables (but start with a much smaller number until you get your code

working)

• Add 0.5 to any expected counts that end up being 0 so as to avoid the divide by 0 case

To determine α control, you should generate data where both multinomials come from the same p vector.

This should be done for each of the sample size combinations (16 total situations where both multinomials

are generated from p1 for instance).

In total, you’ll have 48 simulated α values. You should create a plot similar to the one below to summarize.

To inspect the power, we’ll use the same sample sizes and probabilities, but we’ll vary the probabilities used

rather than using the same one for each multinomial.

• Compare Equal vs Mixed 1

2

• Compare Equal vs Mixed 2

• Compare Mixed 1 vs Mixed 2

You should summarize your results into something similar to that below.

Some coding hints:

• rmultinom(1, size, prob) can be used to generate one multinomial sample

• Two calls of this (using appropriate size and prob) would create a single ‘table’ to be analyze

• You can combine the two samples using cbind() and then transpose it using t() to get it in a similar

form to how you analyzed the hospital data

• Calculate the Pearson chi-square value and compare it to the appropriate theoretical cut-off (returns a

TRUE/FALSE)

• If you wrap all of the above into a replicate(N, { code to do above }), you can then just take

the mean of the result to find the approximated alpha or power value

• You can then copy and paste this a bunch of times or wrap that process in a function that allows you

to change n1, n2, prob1, prob2 (corresponding to the two multinomials)

Report

You should then write up all of the above into a coherent report with the following pieces:

• Introduce the general idea of testing for homogeneity in an introduction

• Analyze the dataset given and briefly discuss what the results are

• Derive the homogenity test (you should use math type, latex, markdown, etc. to typeset your math

symbols)

• Describe the simulation you will do. Your report should include the R code in the text or in an appendix.

The plots should be within the text with a brief discussion of the results.

That’s it! You’ve then finished ST 502 - woot

3

ST 502 R project 3

Data Example

Derivation

Simulation

Report

For this project you will work in groups of 2. The project involves involves analyzing a data set using the

chi-square test for homogeneity, deriving this LRT, and conducting a Monte Carolo simulation to determine

properties of a similar test. You and your partner will create a final report to turn in.

Please make sure that your R file follows the guidelines on moodle. If these guidelines are not met, you will

lose credit. You can submit a .R or a .Rmd file for the code portion You should submit an HTML or PDF

file for the report portion.

Data Example

Consider three different hospitals. Each hospital has patients that end up with infections. Suppose we have

the following data:

Hospital Surgical Site Infections Pneumonia Infections Bloodstream Infections Total

A XXXXXXXXXX

B XXXXXXXXXX

C XXXXXXXXXX

Here we might consider that for a given hospital, we have a random sample with a fixed number of trials.

This implies that we have three separate multinomial distributions, one for each hospital. We might have an

interest in whether or not the multinomials are homogenous across the hospitals.

Use R to conduct a chi-square test for homogeneity using this data. You’ll need to manually create the table

(probably easiest to just use the matrix() function - leave off the total column). I’d like you to manually

calculate the LRT statistic, the Pearson Chi-square statistic, the critical value, and find approximate p-values

for the hypotheses using both test statistics (they will be very small).

Derivation

Ok, we’ve used the likelihood ratio test. Let’s derive it! Consider the generic case of comparing J independent

multinomial distributions, each with I categories.

Multinomial Cat 1 Cat 2 ... Cat I Sample size

1 π11 π21 ... πI1 n1

2 π12 π22 ... πI2 n2

... ... ... ... ... ...

J π1J π2J ... πIJ nJ

We want to test if

H0 : π11 = π12 = ... = π1J , π21 = π22 = ... = π2J , and , πI1 = πI2 = ... = πIJ

vs

HA : At least some probabilities differ

The likelihood here (in the general case) is just the product of the J multinomials.

Under the null hypothesis, π11 = π12 = ... = π1J so we can just replace this with a common π1. Similarly, we

can just consider having π1, ..., πI .

Derive the likelihood ratio test for the homogeneity test. Remember it should come out to be

LRT = 2

J∑

j=1

I∑

i=1

Obsij ln

(

Obsij

Expij

)

with approximate large sample distribution given by a χ2(I−1)(J−1) and expected cell counts given by n•jni•/n

where n is the total sample size.

1

Note: We did some of the details (like looking at the null max in the notes). You need to derive the test form

first and then reproduce other relevant parts.

Simulation

The Pearson statistic can be derived as a Taylor series approximation to the LRT. For the last part of the

project, we’ll investigate the α control of the Pearson chi-square test and its power (so we don’t have to worry

about the ln(0) that can sometimes pop up for the LRT).

Goal of simulation study:

• Determine how well the asymptotic rejection region performs at controlling α

• Determine the power of the asymptotic test when comparing certain alternative situations

Setup:

• Two multinomial case only, where each multinomial has three categorie

• All combinations of four sample sizes for each multinomial (16 total cases)

n1 = 20, 30, 50, 100

n2 = 20, 30, 50, 100

• Three different probabilities that may generate a particular multinomial:

p1 = 1/3, 1/3, 1/3 (equal)

p2 = 1/10, 3/10, 6/10 (mixed 1)

p3 = 1/10, 1/10, 8/10 (mixed 2)

• Use 50000 randomly created tables (but start with a much smaller number until you get your code

working)

• Add 0.5 to any expected counts that end up being 0 so as to avoid the divide by 0 case

To determine α control, you should generate data where both multinomials come from the same p vector.

This should be done for each of the sample size combinations (16 total situations where both multinomials

are generated from p1 for instance).

In total, you’ll have 48 simulated α values. You should create a plot similar to the one below to summarize.

To inspect the power, we’ll use the same sample sizes and probabilities, but we’ll vary the probabilities used

rather than using the same one for each multinomial.

• Compare Equal vs Mixed 1

2

• Compare Equal vs Mixed 2

• Compare Mixed 1 vs Mixed 2

You should summarize your results into something similar to that below.

Some coding hints:

• rmultinom(1, size, prob) can be used to generate one multinomial sample

• Two calls of this (using appropriate size and prob) would create a single ‘table’ to be analyze

• You can combine the two samples using cbind() and then transpose it using t() to get it in a similar

form to how you analyzed the hospital data

• Calculate the Pearson chi-square value and compare it to the appropriate theoretical cut-off (returns a

TRUE/FALSE)

• If you wrap all of the above into a replicate(N, { code to do above }), you can then just take

the mean of the result to find the approximated alpha or power value

• You can then copy and paste this a bunch of times or wrap that process in a function that allows you

to change n1, n2, prob1, prob2 (corresponding to the two multinomials)

Report

You should then write up all of the above into a coherent report with the following pieces:

• Introduce the general idea of testing for homogeneity in an introduction

• Analyze the dataset given and briefly discuss what the results are

• Derive the homogenity test (you should use math type, latex, markdown, etc. to typeset your math

symbols)

• Describe the simulation you will do. Your report should include the R code in the text or in an appendix.

The plots should be within the text with a brief discussion of the results.

That’s it! You’ve then finished ST 502 - woot

3

ST 502 R project 3

Data Example

Derivation

Simulation

Report

May 06, 2021

- PSYC-FP4700 Assessment 1 Worksheet Assessment 1 – Basics of Research and Statistics, Frequency Distributions, Percentiles, and Graphical Representations Complete the following problems within this...SolvedSep 05, 2021
- Topic: Proportions Activity: Reese’s Pieces Background Information: The goal of a confidence interval is to estimate a population parameter based on a sample statistic. All confidence intervals have...SolvedJun 06, 2021
- 1- In a random sample of 950 families living in the Camden New Jersey 247 families had annualincome below $30,000. Construct a 96% confidence interval for the proportion of families livingin Camden...May 12, 2021
- 1- The following data is a sample of profit or loss (in million dollars) of 7 corporations </o:p> </o:p> XXXXXXXXXX 35</o:p> </o:p> a) Find the sample mean and variance of...May 12, 2021
- 2.5 z-scores Section 2: The Variance and the Standard Deviation 2.5 z-scores Copyright © 2021 John J. Kim PSY 371 Do not cite, quote, or distribute without permission of author Spring 2021 Often it is...SolvedMay 12, 2021

- public class LinkedList { private class Node { public Object value; public Node next; public char[] data; } public Node head = null; public Node tail = null; public String About() { return "About me";...Oct 21, 2021
- ;; The first three lines of this file were inserted by DrRacket. They record metadata ;; about the language level of this file in a form that our tools can easily process. #reader(lib...Oct 21, 2021
- www.tsu.eduClick on: my tsu in upper right hand cornerUser ID: T00747370Pin: 043074Go to my tsu portalGo to blackboardClick on Math class on top. College algebra.Midterm exam XXXXXXXXXXquestions not...Oct 21, 2021
- Self-Service Course 4 Project - Kiva Robot Remote Control (Amazon_Technical_Academy.Self_Service_Courses.Course4.JavaProgrammingPostDuke.project.KivaRobotRemoteControl.WebHome) - XWiki Self-Service...Oct 21, 2021
- HIST E 1672 Long 1960s: Pop Music, Counterculture, Black Awakening Choose TWO prompts to respond to and write a response using specific examples from the readings on our syllabus. You are welcome to...Oct 21, 2021

Copy and Paste Your Assignment Here

Copyright © 2021. All rights reserved.