The 20th Applied Statistics Symposium - Columbia Statistics [PDF]

Jun 26, 2011 - Jianqing Fan, Xu Han and Weijie Gu. Princeton University. 11:13 AM A Statistical ...... Georgia Southern

3 downloads 9 Views 19MB Size

Recommend Stories


[PDF] Download APPLIED STATISTICS
The wound is the place where the Light enters you. Rumi

Applied Statistics
Life is not meant to be easy, my child; but take courage: it can be delightful. George Bernard Shaw

Applied Statistics
Raise your words, not voice. It is rain that grows flowers, not thunder. Rumi

PDF APPLIED BUSINESS STATISTICS SO
Your big opportunity may be right where you are now. Napoleon Hill

BSc Applied Statistics - Social & Economic Statistics
Be like the sun for grace and mercy. Be like the night to cover others' faults. Be like running water

Applied Mathematics and Statistics
Just as there is no loss of basic energy in the universe, so no thought or action is without its effects,

[PDF] Applied Probability (Springer Texts in Statistics)
Be who you needed when you were younger. Anonymous

[PDF] Applied Statistics and Probability for Engineers
You often feel tired, not because you've done too much, but because you've done too little of what sparks

APPLIED MATHEMATICS AND STATISTICS (AMS)
When you do things from your soul, you feel a river moving in you, a joy. Rumi

[PDF] Download Computational Statistics (Statistics and Computing)
Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

Idea Transcript


 

  

The 20th Applied Statistics Symposium June 26th-29th, 2011 N e w Y o r k C i ty

Published by: International Chinese Statistical Association Edited by: Tian Zheng and Mengling Liu Designed by: Tian Zheng and Ying Wei Produced using: R codes and LATEX template adapted from generbook package by Lara Lusa and Andrej Blejec.

International Chinese Statistical Association The 20th Applied Statistics Symposium 2011 C ONFERENCE I NFORMATION , P ROGRAM AND A BSTRACTS June 26 - 29, 2011 The Westin New York Hotel at Times Square New York, New York, USA

Organized by International Chinese Statistical Association

c �2011 International Chinese Statistical Association

Contents Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conference Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Committees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conference Venue Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Program Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Keynote Lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Student Paper Awards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Short Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Social Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ICSA 2012 in Boston, MA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scientific Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monday, June 27, 8:00 AM-10:30 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monday, June 27. 10:45 AM-12:25 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monday, June 27. 2:00 PM - 3:40 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Monday June 27. 3:50 PM - 5:30 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tuesday, June 28. 8:45 AM - 9:45 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tuesday, June 28. 10:00 AM - 11:40 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tuesday, June 28. 1:30 PM - 3:10 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tuesday, June 28. 3:30 PM - 5:10 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wednesday, June 29. 8:45 AM - 10:25 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wednesday, June 29. 10:45 AM-12:25 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Abstracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 1: Statistical Challenges Arising from Design and Analysis of NIH Studies . . . . . . . . . . . . . . . . Session 2: Statistical Challenges from Survey, Behavioral and Social Data . . . . . . . . . . . . . . . . . . . . . Session 3: Enhancing Probability of Success Using Modeling & Simulation . . . . . . . . . . . . . . . . . . . Session 4: Financial Statistics, Risk Management, and Rare Event Modeling . . . . . . . . . . . . . . . . . . . Session 5: Innovative Drug Safety Graphics Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 6: Topics in Statistical Machine Learning and High Dimensional Data Analysis . . . . . . . . . . . . . Session 7: High-Dimensional Feature Selection, Classification and Dynamic Modeling for Genetics Applications Session 8: Recent Developments in Design and Analysis for High-Dimensional Data . . . . . . . . . . . . . . . Session 9: Pharmaceutical Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 10: Lifetime Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 11: High Dimensional Statistical Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 12: Multiplicity Issues and Predictive Modeling of Enrollment in Clinical Trials . . . . . . . . . . . . . Session 13: Statistical Issues in Late-Stage HIV Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 14: New Developments in Methods for the Analysis of Repeated Measures Data . . . . . . . . . . . . . Session 15: Statistical Methods in Biomarker Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 16: Advancing Statistical Methodolgy for Handling Missing Data in Longitudinal Studies . . . . . . . . Session 17: Developments and Applications of Models with Time-Varying Covariates or Coefficients . . . . . . Session 18: Mixture Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 19: Design and Analysis of Biomedical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 20: Bridging and multi-regional clinical trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 21: Novel Approaches to the Genetic Dissection of Complex Traits . . . . . . . . . . . . . . . . . . . . Session 22: Non-/Semi-Parametric Models for Complex Data . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

1 2 2 3 4 7 8 10 11 15 16 17 17 17 20 23 26 26 29 31 34 37 40 40 40 41 42 42 43 44 45 46 46 47 48 48 49 50 50 51 52 52 53 54 55

Session 23: Time to Event Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 24: Adaptive Design in Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 25: Model Selection and Its Application in Clinical Trial Design . . . . . . . . . . . . . . . . . . . . . Session 26: Challenges in Comparative Effectiveness Research . . . . . . . . . . . . . . . . . . . . . . . . . . Session 27: Challenges and Developments in Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . Session 28: Design and Analysis of Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 29: Law and Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 30: Stochastic Root-Finding and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 31: Functional Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 32: Recent Developments and Future Prospective in Statistical Methods in Longitudinal Data . . . . . . Session 33: High-Dimensional Inference in Biostatistical Applications . . . . . . . . . . . . . . . . . . . . . . Session 34: Statistical Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 35: Adaptive Designs Post-FDA Guidance: Challenges and Solutions . . . . . . . . . . . . . . . . . . . Session 36: Spatial Statistics in Bio-medical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 37: Nonparametric Inference and Secondary Analysis in Genomwide Studies . . . . . . . . . . . . . . . Session 38: Manufacturing and Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 39: Next Generation Pharmacovigilance: Methodological and Policy Challenges . . . . . . . . . . . . . Session 40: Statistics in Drug Discovery and Early Development . . . . . . . . . . . . . . . . . . . . . . . . . . Session 41: Large Scale Data Analysis and Dimension Reduction Techniques in Regression Models . . . . . . . Session 42: New Approaches for Design and Estimation Issues in Clinical Trials . . . . . . . . . . . . . . . . . Session 43: Recent Development in Multivariate Survival Data Analysis . . . . . . . . . . . . . . . . . . . . . . Session 44: Interface Between Nonparametric and Semiparametric Analysis and Genetic Epidemiology . . . . . Session 45: High Dimensional Statistics in Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 46: Biomarker Discovery and Individualized Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . Session 47: Applications in Spatial Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 48: The Totality of Evidence in Safety and Efficacy Evaluation of Medical Products . . . . . . . . . . . Session 49: Estimating Treatment Effects in Randomized Clinical Trials with Non-compliance and Missing Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 50: Analysis of Biased Survival Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 51: Suicide Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 52: Joint Modeling of Longitudinal and Time-to-Event Data in Medical Research . . . . . . . . . . . . Session 53: Survey Research Method and Its Application in Public Health . . . . . . . . . . . . . . . . . . . . . Session 54: J P Hsu Memorial Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 55: Statistics in Environmental, Financial and Socical Science . . . . . . . . . . . . . . . . . . . . . . . Session 56: Meta Analysis and Evidence from Large Scale Studies . . . . . . . . . . . . . . . . . . . . . . . . Session 57: Semiparametric Models with Application in Biosciences . . . . . . . . . . . . . . . . . . . . . . . Session 58: Statisticsl Methods for Anaqlysis of Next Generation Sequences Data . . . . . . . . . . . . . . . . Session 59: Network and Related Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 60: Current Approaches for Pharmaceutical Benefit-Risk Assessment . . . . . . . . . . . . . . . . . . Session 61: Causal Inference and its Applications in Drug Development . . . . . . . . . . . . . . . . . . . . . . Session 62: Recent Advance in Multiplicity Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 63: Bioassay: Methodology for a Rapidly Developing Area . . . . . . . . . . . . . . . . . . . . . . . . Session 64: Statistical Challenges in Developing Biologics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 65: Experimental Design and Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 66: Advancing Clinical Trial Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 67: Panel Session I: Adaptive Designs—When Can and How Do We Get There From Here? . . . . . . . Session 68: Theoretical Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 69: Bayesian Inferences and Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 70: Statistical Analysis on Spatial and Temporal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 71: Recent Developments in Methods for Handling Missing Data . . . . . . . . . . . . . . . . . . . . . Session 72: Methodology for and Applications of Administrative Data . . . . . . . . . . . . . . . . . . . . . . . Session 73: Fiducial Inference, Generalized Inference, and Applications . . . . . . . . . . . . . . . . . . . . . Session 74: Functional Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 75: Statistical Method and Theory for High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . Session 76: Statistical Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 77: Assessment of Blinding and Placebo Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55 56 57 58 58 59 60 61 62 62 63 63 64 65 65 66 67 67 68 69 70 70 71 72 72 73 73 74 74 75 76 76 77 78 79 80 81 82 82 83 84 84 85 85 86 86 87 88 89 90 91 91 92 93 94

Session 78: Recent Advances in Survival Analysis and Clinical Trials . . . . . . . . . . . . . . . . . . . . . Session 79: Biomarker Based Adaptive Design and Analysis for Targeted Agent Development . . . . . . . . Session 80: Analysis of Complex data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 81: Historical Insight on Statisticians Role in the Pharmaceutical Development . . . . . . . . . . . . Session 82: Handling Heaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 83: Panel Session II: Industry-Academia Partnership: Successful Stories and Opportunities . . . . . Session 84: Statistical Methods for Disease Genetics and Genomics . . . . . . . . . . . . . . . . . . . . . . Session 85: Enhancing Clinical Development Efficiency with Adaptive Decision Making . . . . . . . . . . . Session 86: Multivariate and Subgroup Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 87: Statistical Issues Arised from Clinical Research . . . . . . . . . . . . . . . . . . . . . . . . . . Session 88: Recent Developments in Modeling Data with Informative Cluster Size . . . . . . . . . . . . . . Session 89: New Developments in High Dimensional Variable Selection . . . . . . . . . . . . . . . . . . . . Session 90: Model Selection and Related Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 91: Empirical Likelihood and Its Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 92: Design and Analysis Issues in DNA methylation . . . . . . . . . . . . . . . . . . . . . . . . . . Session 93: Recent Advances in Statistical Inference for Functional and Longitudinal Data . . . . . . . . . . Session 94: Emerging Statistical Methods and Theories for Complex and Large Data . . . . . . . . . . . . . Session 95: Complex Multivariate Outcomes in Biomedical Science . . . . . . . . . . . . . . . . . . . . . . Session 96: Application of Machine Learning Approaches in Biomedical Research . . . . . . . . . . . . . . Session 97: Challenges in the Development of Regression Models . . . . . . . . . . . . . . . . . . . . . . . Session 98: Recent Development in Measurement Error Models . . . . . . . . . . . . . . . . . . . . . . . . Session 99: Recent Developments in Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 100: Challenging Topics in Longitudinal Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 101: High-Dimensional Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 102: Phase I/II Clinical Studies: Safety versus Efficacy . . . . . . . . . . . . . . . . . . . . . . . . Session 103: Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 104: Recent Advances in Genome-Wide Association Studies . . . . . . . . . . . . . . . . . . . . . Session 105: Statistical Methodology and Regulatory Issues in Drug Development in Multiple Regions . . . Session 106: Genomic Biomarker Applications in Clinical Studies . . . . . . . . . . . . . . . . . . . . . . . Session 107: Statistical Modeling and Application in Systems Biology . . . . . . . . . . . . . . . . . . . . Session 108: Statistical Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 109: Functional and Nonlinear Curve Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Session 110: Statistical Methods for High-Dimensional Data or Large Scale Studies . . . . . . . . . . . . . Session 111: Special presentation: CDISC Standard Development, Implementation Strategy, and Case Study Index of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94 95 96 96 97 98 98 99 99 100 101 102 102 103 103 104 105 106 107 108 109 109 110 111 111 112 113 114 114 115 115 116 117 119 121

Welcome    

 

With  great  pleasure,  on  behalf  of  the  ICSA  2011  organizing  committee,  I  welcome  you  to  New   York  City,  for  the  20th  ICSA  Applied  Statistics  Symposium.         This  year,  the  ICSA  symposium  reaches  its  20th  anniversary.  We  are  proud  to  host  this  exciting   milestone   event.   With   your   enthusiastic   support,   we   are   thrilled   to   have   a   historical   high   number  of  attendees,  at  more  than  500.    As  a  result,  the  symposium  offers  a  rich  and  diverse   scientific  program,  7  short  courses  and  a  total  of  114  scientific  sessions.  We  believe  you  will  find   that   it   provides   a   great   opportunity   for   discussion,   learning   and   networking.   We   have   also   arranged  social  events,  mixer,  cruise  night  and  banquet.  It  is  our  sincere  hope  that  you  find  the   symposium  interesting,  enjoyable  and  memorable.     Moreover,   New   York   City   is   one   of   the   most   populous   metropolitan   areas   in   the   world,   with   countless  famous  landmarks  and  attractions.  Our  symposium  venue,  the  Westin  New  York  at   the  Times  Square,  is  located  at  the  center  of  the  Times  Square,  we  believe  you  will  have  a  true   New  York  experience.       I  would  like  to  take  this  opportunity  to  extend  my  gratitude  to  all  of  you,  all  the  participants,   who   have   made   this   symposium   reality.     In   particular,   I   would   like   to   thank   our   organizing   committee   members   and   volunteers   who   have   spent   tremendous   time   and   effort   for   the   preparation,   and   to   thank   our   program   committee   members   and   session   organizers   for   the   organizing  invited  and  contributed  sessions.  Finally,  on  behalf  of  the  organizing  committee,  I   acknowledge  the  generous  support  of  all  sponsors  and  exhibitors.     Enjoy  the  symposium!       Zhezhen  Jin   Chair  of  the  ICSA  2011  organizing  committee    

1

Committees   Executive  Committee   Jin, Zhezhen Li, Gang Li, Mark Chunming Liu, Mengling Quan, Hui Wei, Ying Zhang, Wei Zheng, Tian

Columbia University Johnson & Johnson Pfizer Inc. New York University Sanofi-Aventis U.S. LLC. Columbia University Boehringer Ingelheim Pharmaceuticals, Inc Columbia University

Program  Committee   Chen, James Duan, Naihua He, Weili He, Wenqing Hu, Inchi Hu, Joan Jiang, Qi Jin, Jiashun Li, Hongzhe Li, Runze Liu, Aiyi Liu, Dacheng Loh, Ji Meng Meng, Zhaoling Pan, James Qu, Annie Shao, Yongzhao Soon, Guoxing Wang, Susan Zhang, David Zhang, Ying Zhao, Hongyu Zheng, Gang Zhu, Ji

2

U.S. Food and Drug Administration Columbia University Merck & Co., Inc. University of Western Ontario Hong Kong University of Science and Technology Simon Fraser University Amgen Inc. Carnegie Mellon University University of Pennsylvania The Pennsylvania State University National Institutes of Health Boehringer Ingelheim Pharmaceuticals AT&T Labs Research Sanofi-Aventis U.S. LLC. Johnson & Johnson University of Illinois at Urbana-Champaign New York University U.S. Food and Drug Administration Boehringer Ingelheim Pharmaceuticals Genentech, Inc. University of Iowa Yale University National Institutes of Health University of Michigan

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

ICSA 2011 Applied Statistics Symposium Sponsors

NYC Metropolitan Area Chapter

Princeton-Trenton Chapter









 

ICSA 2011 Applied Statistics Symposium Exhibitors Cambridge University Press EDETEK Smith Hanley

CRC Press - Taylor & Francis Group K-Force Springer

Please visit our exhibitors that are located at the 9th Floor of the Westin at the Times Square, "The New York Atrium". ICSA Applied Statistics Symposium 2011, NYC, June 26-29

3

Venue  Information  and  Floor  Plans   Conference  Venue   The  Westin  New  York  at  Times  Square   270  West  43rd  Street,  New  York,  NY  10036   Phone:  212-­‐201-­‐2700  Fax:  212-­‐201-­‐2701  

  A  pocketsize  street  and  subway  map  of  Manhattan  can  be  found  in  your  conference  bag.  

Quick  reference:     Registration  and  conference  information  desk     Exhibitions             Meeting  rooms   Belasco             Booth  Boardroom           Gramercy             Imperial             Majestic  (I/II)           Manhattan             Melville             Minkoff             Nederlander           New  Amsterdam           Palace             Pearl             Plymouth             Royale            

   

The  New  York  Atrium  (9th  Floor)   The  New  York  Atrium  (9th  Floor)  

                           

3rd  Floor   3rd  Floor   4th  Floor   4th  Floor   5th  Floor   5th  Floor   5th  Floor   9th  Floor   9th  Floor   9th  Floor   9th  Floor   9th  Floor   9th  Floor   9th  Floor  

  4

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Venue  Information  and  Floor  Plans      

 

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

 

5

Venue  Information  and  Floor  Plans      

 

 

6

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Program  Overview   Sunday June 26th, 2011 Time   8:00  AM-­‐6:00  PM   8:00  AM  -­‐5:00  PM   8:00  AM-­‐12:00  PM  

Room   Atrium   Nederlander   Royale  

8:00  AM-­‐12:00  PM   8:00  AM-­‐12:00  PM   12:00  PM  -­‐1:00  PM   1:00  PM-­‐5:00  PM   1:00  PM-­‐5:00  PM     7:00  PM  -­‐  9:00  PM  

New  Amsterdam   Minskoff     Minskoff   New  Amsterdam     Atrium  

Session   Registration   Short  Course:  Multiple  Comparisons  in  Clinical  Trials     Short  Course:  Design  and  Analysis  of  Group  Sequential  Trials:  Recent  Advances  and   Software   Short  Course:  Dose  finding  studies:  methods  and  implementation     Short  Course:  Group  sequential  method  in  survival  analysis   Lunch  for  registered  short  course  attendees   Short  Course:  Concepts  in  equivalence/noninferiority  testing:  issues  and  challenges   Short  Course:  Strategies  for  extracting  reliable  information  from  Megavariate  data       Opening  Mixer  

Monday June 27th, 2011 7:30  AM  -­‐  6:00  PM   8:00-­‐8:15  AM   8:20-­‐9:20  AM   9:25-­‐10:25  AM   10:25-­‐10:45  AM   10:45  AM  -­‐12:25  PM   12:25  PM  -­‐  2:00  PM   2:00-­‐3:40  PM   3:40  -­‐  3:50  PM  

Atrium   Majestic  (I/II)   Majestic  (I/II)   Majestic  (I/II)   Atrium   See  program     See  program   Atrium  

Registration   Welcome   Keynote  I:  David  Donoho,  Stanford  University   Keynote  II:  Ji  Zhang,    Sanofi-­‐Aventis  U.S.  LLC.   Break   Parallel  Sessions   Lunch  on  own   Parallel  Sessions   Break  

3:50-­‐5:30  PM     6:30  PM  -­‐  10:00  PM  

See  program     Off  site  

Parallel  Sessions     Cruise  Night    

Tuesday June 28th, 2011 8:30  AM  -­‐  5:30  PM   8:45  AM-­‐9:45  AM   9:45  -­‐  10:00  AM   10:00-­‐11:40  AM   11:40  AM  -­‐  1:30  PM   1:30-­‐3:10  PM   3:10-­‐3:30  PM   3:30-­‐5:10  PM     6:30  PM  -­‐  9:00  PM  

Atrium   Majestic  (I/II)   Atrium   See  program     See  program   Atrium   See  program     Off  site  

Registration   Keynote  III:  Danyu  Lin,  University  of  North  Carolina  at  Chapel  Hill   Break   Parallel  Sessions   Lunch  on  own   Parallel  Sessions   Break   Parallel  Sessions     Banquet  (Bus  leaves  at  5:20  PM)  

10:00 PM Wednesday June 29th, 2011 8:30  AM  -­‐  1:00  PM   8:45-­‐10:25  AM   10:25-­‐10:45  AM   10:45AM-­‐12:25  PM  

Atrium   See  program   Atrium   See  program  

Registration   Parallel  Sessions     Break   Parallel  Sessions  

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

7

!"#$%&"'(")&*+",' Monday, June 27th 8:20 AM - 9:20 AM

David Donoho Anne T. and Robert M. Bass Professor of Humanities and Sciences, Professor of Statistics, Stanford University Title: Compressed Sensing and Sparse Effects models in Statistics

I will briefly review some recent developments in information theory and signal processing, often collectively called "Compressed Sensing". Some achievements of this methodology include speeding up pediatic MRI's by an order of magnitude and speeding up genotyping for rare diseases, also by an order of magnitude. The roots of these achievements are firmly statistical, and can be traced to sparse effects models which are currently very popular in statistics. I will review some of what is know about sparse effects models in various contexts, and methodology that became popular over the last years because of such models, including soft thresholding, LASSO, False Discovery Rate control methods and Higher Criticism methods. I will also review the many connections that exist with robust statistics, with Bayesian statistics, and in the case of Compressed Sensing, with experimental design and with other fields. At the same time I will review the very different types of connections that exist, with far-flung topics like high dimensional geometry of convex bodies and multi user information theory. I maintain that as statisticians we should be proud that our field has proved to be so useful and so connected to such wide-ranging intellectual advances.

8

!

Monday, June 27th 9:25 AM-10:25 AM

Ji Zhang Vice President, Head of Clinical Sciences and Operations, Sanofi-Aventis U.S. LLC. Speaker Biography ! Ji Zhang is Vice President, Head of Clinical Sciences and Operations Scientific Core Platform, which provides scientific expertise and operational services in support of clinical development. It consists of more than 2400 researchers in more than 30 countries, a complete early phase development group, and groups specialized in the set up, conduct, data collection and management, statistical analysis and reporting of clinical trials and observational studies. Ji is a member of the executive committee of sanofi global R&D, and a member of the investment committee which reviews and decides on project progressions. He joined sanofi-aventis in 2003 as Vice President, Head of Biostatistics and Programming. Prior to joining sanofi-aventis, he was Senior Director, Clinical Biostatistics, Merck Research Laboratories. Ji holds a B.S. and M.S. in mathematics, probability and statistics from Peking University, China. He earned a Ph.D. in statistics from North Carolina State University. Title: Growth with ICSA, and continued growth and diversification in the pharmaceutical industry Abstract ! Over the last 20+ years, there have been tremendous growth and success in the pharmaceutical industry despite some setbacks and recent challenges. Correspondingly, there has been healthy growth in the statistical profession, with some mature and well established areas. However, growth has significantly slowed down in recent years. We examine some current challenges we face today, and contemplate new areas where statisticians should seek to engage, such as deeper symbiotic engagement in multidisciplinary discovery teams, translational medicine, comparative effectiveness, proactive safety signal detection and ascertainment, innovations in (data driven) clinical trial design, total quality principle guided ICSA Applied Statistics Symposium 2011, NYC, June 26-29

!"#$%&"'(")&*+",' study conduct and monitoring, quantitative decision evaluations and decision making. In addition, growth should also come from emerging countries, e.g., China, where the statistical profession has started but yet to experience dramatic growth. We believe our profession will continue the growth in more diversified areas with a total customer focused/engaged approach.

Tuesday June 28th 8:45 AM-9:45 AM

Danyu Lin Dennis Gillings Distinguished Professor of Biostatistics, University of North Carolina, Chapel Hill.

Speaker Biography ! Danyu Lin is the Dennis Gillings Distinguished Professor of Biostatistics at the University of North Carolina at Chapel Hill. Professor Lin is an internationally recognized leader in survival analysis and statistical genetics. He has published over 130 peer-reviewed papers, most of which appeared in premier statistical journals. Several of his methods have been incorporated into commercial software packages and widely used in practice. Professor Lin is on Thomson ISI's list of Highly Cited Researchers in Mathematics. He is a former recipient of the Mortimer Spiegelman Gold Medal from the American Public Health Association and a Fellow of both the American Statistical Association and the Institute of Mathematical Statistics. He currently serves as an Associate Editor of Biometrika and a Consultant to the FDA.

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Title: Recent Developments in the Statistical Analysis of Recurrent Event Data Abstract ! Recurrent events (e.g., multiple infections and tumor recurrences) arise frequently in clinical trials. Performing standard survival analysis on time to the first event does not fully utilize the available data and does not provide information about treatment effectiveness beyond the first event. The proportional intensity model of Andersen and Gill (1982) enables one to evaluate the treatment effects on the entire recurrent event process. Unfortunately, the underlying assumption of independent recurrent event times is often violated in clinical trials. There are two major approaches to relaxing the independence assumption: one is to formulate the marginal distribution of the recurrent event process through the proportional means/rates model while leaving the dependence structure completely unspecified, and one is to characterize the dependence through frailty or random effects. The former is more robust, whereas the latter is more efficient and allows the use of prior event history to predict future recurrences. A further complication in the analysis of recurrent event data is that patients may drop out of the trial for healthrelated reasons (e.g., death and toxicities), in which case the usual independent censoring assumption is likely violated. One may adjust for such informative drop-out by joint modeling of recurrent and terminal events. In this presentation, I will describe all aforementioned methods and provide illustrations with real data.

9

  International  Chinese     Statistical  Association    

June  26-­29,  2011  -­  New  York,  NY

 

SAVE   20%*  

 

 

NEW AND NOTEWORTHY TITLES!

Handbook of Markov Chain Monte Carlo

Principles of Uncertainty

Steve Brooks, Andrew Gelman, Galin Jones, Xiao-‐Li Meng

Joseph B. Kadane

Price: $99.95 Cat.  #: C7941 ISBN: 9781420079418 Pub  Date: May 2011

Price: $89.95 Cat.  #: K12848 ISBN: 9781439861615 Pub  Date: May 2011

R Graphics

Design and Analysis of Non-‐Inferiority Trials

Second Edition

Mark D. Rothmann, Brian L. Wiens, Ivan S.F. Chan

Paul Murrell

  Price: $89.95 Cat.  #: C8040 ISBN: 9781584888048 Pub  Date: July 2011

Price: $79.95 Cat.  #: K11535 ISBN: 9781439831762 Pub  Date: June 2011

Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis Kelly H. Zou, Aiyi Liu, Andriy I. Bandos, Lucila Ohno-‐ Machado, Howard E. Rockette Price: $89.95 Cat.  #: K10617 ISBN: 9781439812228 Pub  Date: July 2011

Order online at:

www.crcpress.com   to  get  FREE  standard  shipping.   *Use  Promo  Code   194CM   to  apply  discount  

Hurry! Offer expires Aug. 31, 2011.

  SIGN  UP  ONLINE  AND  RECEIVE   INFORMATION  ABOUT  OUR  LATEST   OFFERINGS  AND  SPECIAL  DISCOUNTS!  

 

CRC Press/Taylor & Francis Group 6000 Broken Sound Parkway, NW Suite 300 Boca Raton, FL 33487 Tel: 1-800-272-7737 Fax: 1-800-374-3401 e-mail: [email protected]

 

Great New Titles from Cambridge University Press Handbook of Functional MRI Data Analysis

Optimal High-Throughput Screening

R USSELL A. P OLDRACK , J EANETTE M UMFORD , T HOMAS N ICHOLS

Practical Experimental Design and Data Analysis for Genome-Scale RNAi Research

X IAOHUA D OUGLAS Z HANG

$80.00: Hb: 978-0-521-51766-9: 248 pp.

S ECOND E DITION

$99.00: Hb: 978-0-521-51771-3 $39.99: Pb: 978-0-521-73444-8: 232 pp.

Large-Scale Inference

Negative Binomial Regression

Empirical Bayes Methods for Estimation, Testing, and Prediction

J OSEPH M. H ILBE

B RADLEY E FRON

$85.00: Hb: 978-0-521-19815-8: 572 pp.

Institute of Mathematical Statistics Monographs $65.00: Hb: 978-0-521-19249-1: 276 pp.

S ECOND E DITION

Numerical Methods of Statistics

The Essential Guide to Effect Sizes

J OHN F. M ONAHAN

Statistical Power, Meta-Analysis, and the Interpretation of Research Results

Cambridge Series in Statistical and Probabilistic Mathematics

P AUL D. E LLIS

$130.00: Hb: 978-0-521-19158-6 $55.00: Pb: 978-0-521-13951-9: 464 pp.

Statistical Learning for Biomedical Data J AMES D. M ALLEY , K AREN G. M ALLEY , S INISA P AJEVIC Practical Guides to Biostatistics and Epidemiology $105.00: Hb: 978-0-521-87580-6 $48.00: Pb: 978-0-521-69909-9: 298 pp.

$95.00: Hb: 978-0-521-19423-5 $34.99: Pb: 978-0-521-14246-5: 192 pp.

Data Analysis Using Regression and Multilevel/Hierarchical Models A NDREW G ELMAN , J ENNIFER H ILL Analytical Methods for Social Research $112.00: Hb: 978-0-521-86706-1 $50.00: Pb: 978-0-521-68689-1: 648 pp. Prices are subject to change.

www.cambridge.org/us/statistics 800.872.7423

Student  Paper  Awards    

Jiann-­‐Ping  Hsu  Pharmaceutical  and  Regulatory  Sciences  Student  Paper  Award   Brian Claggett, Harvard School of Public Health  Title: Estimating Subject-Specific Treatment Differences for Risk-Benefit Assessment with Competing Risk Event-Time Data  Time: Tuesday, June 28th. 10:00 AM - 11:40 AM  Session 54: J P Hsu Memorial Session (Majestic (I/II), 5th Floor)

Student  Travel  Awards     Ada Lau, University of Oxford  Title: Spatiotemporal Wind Power Probabilistic Forecasts Using Latent Gaussian Processes Dungang Liu, Rutgers University  Title: Exact meta-analysis approach for the common odds ratio of 2 × 2 tables with rare events  Time: Monday, June 27th. 2:00 PM - 3:40 PM  Session 25: Model Selection and Its Application in Clinical Trial Design (Imperial, 4th Floor) Ying Yuan, University of North Carolina at Chapel Hill  Title: Varying Coefficient Models for Modeling Diffusion Tensors Along White Matter Fiber Bundles  Time: Monday, June 27th. 2:00 PM - 3:40 PM  Session 17: Developments and Applications of Models with Time-varying Covariates or Coefficients (Majestic II, 5th Floor)

10

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Short  Courses   1. Multiple Comparisons in Clinical Trials (Full day) H.M. James Hung and Sue-Jane Wang US Food and Drug Administration Abstract: For regulatory applications, each clinical trial is conducted almost surely to entertain multiple objectives. As such, multiple comparisons are often performed in a clinical trial. As a clinical trial is regarded as a human experiment, an adequate control of experimentwise type I error is traditionally thought important to contain false positives. Statistical methodologies for ensuring a proper control are affluent in decades. As the clinical development program for a tested medical product is increasingly evolved, the statistical framework of inference pertaining to each clinical trial alone in relation to a family of multiple trials and to the issue of level of evidence becomes fuzzy. This short course will focus on visiting the statistical framework and paradigm of inference for multiple doses, multiple endpoints, and multiple analyses within a single clinical trial or a family of clinical trials, which may be a multi-regional clinical trial or trials. Case examples will be presented to facilitate discussion. The topics to be covered are: 1. 2.

3.

4.

Early phase learning trials Single versus multiple confirmatory placebocontrolled trials Multiple endpoints Multiple doses Multiple analyses Active-controlled trials Placebo is present Placebo is absent Non-inferiority and superiority analyses Adaptive design trials Early phase trial Pivotal trial

About the Instructors Dr. H.M. James Hung is Director of Division of Biometrics I, Office of Biostatistics, Center for Drug Evaluation and Research (CDER), U.S. Food and Drug Administration (FDA). He is an author of over 90 papers and book chapters in statistical and medical literature. His research interests include factorial design clinical trials, utility of p-value distribution, adaptive design/analysis in clinical trials, and non-inferiority trials, multi-regional clinical trials. He made major contributions to regulatory reviews of many large mortality or morbidity clinical trials in cardiovascular and renal disease areas. He has presented over 180 invited talks and short courses in his research areas. Dr. Hung received two FDA/CDER Scientific Achievement Awards, one FDA Scientific Achievement Group Award and many other awards for the recognition of his scientific contributions to the US FDA. Currently, he serves as an Editor-in-Chief for Journal of Pharmaceutical Statistics and an Associate Editor for Statistics in Medicine and Journal of Biopharmaceutical Statistics. He is ICSA Applied Statistics Symposium 2011, NYC, June 26-29

a Fellow of the American Statistical Association and an elected member of the International Statistical Institute. Dr. Sue-Jane Wang is Associate Director for Pharmacogenomics and Adaptive Design, Office of Biostatistics, Office of Translational Sciences, CDER, FDA. Dr. Wang is an author of over 90 papers and book chapters in statistical, clinical, genetic, bioinformatics, and pharmacogenomics literature. She made major contributions to regulatory reviews in these areas. As a result, Dr. Wang received two FDA Outstanding Intercenter Scientific Collaboration Awards and was recently awarded the FDA level Scientific Achievement individual Awards in recognition of her sustained record of published regulatory research in statistical design and methodology advancing complex and emerging clinical trial designs and analysis that support regulatory guidance, policies and review. She has presented over 200 invited talks, discussion, and short courses in her research areas including non-inferiority, multi-regional trials. She is an elected member of the International Statistical Institute. She has served as an Editor-in-Chief for Pharmaceutical Statistics, and is an Associate Editor for Statistics in Medicine and Statistics in Biosciences. She is conference co-chair for MCP 2011 conference.

2. Design and Analysis of Group Sequential Trials: Recent Advances and Software (Half day) Mei-Chiung Shih and Balasubramanian Narasimhan, Stanford University Abstract: Group sequential designs that have provisions for data monitoring and interim analyses are now widely used in confirmatory Phase III trials, especially those with survival endpoints. The past decade witnessed important methodological advances in the design of group sequential trials and in the primary and secondary analyses of the data following a group sequential trial. The course gives an introduction to these advances and to open-source software packages available at the Center for Innovative Study Design website at the School of Medicine at Stanford University. It also provides an overview of related chapters in the monograph Sequential Experimentation in Clinical Trials: Design and Analysis, by Jay Bartroff, Tze Leung Lai and Mei-Chiung Shih, Springer, 2011. About the instructors: Mei-Chiung Shih, Ph.D. (Stanford University), is Acting Associate Director for Science and Senior Biostatistician at the Department of Veterans Affairs Cooperative Studies Program Coordinating Center, at Palo Alto Health Care System, and Assistant Professor of Biostatistics, Department of Health Research and Policy, at Stanford University. She was a Research Biostatistician at Schering-Plough Research Institute and an Assistant Professor of Biostatistics at Harvard School of 11

Short  Courses   Public Health before joining Stanford in 2006. She has published over 40 papers in group sequential designs and inference, longitudinal data analysis, statistical genetics, and collaborative clinical research. Balasubramanian Narasimhan, Ph.D. (Florida State University), is a Senior Research Scientist in the Department of Health Research and Policy and in the Department of Statistics at Stanford University. He has held appointments as Assistant Professor at University of Minnesota and at Pennsylvania State University before joining Stanford University in 1996. Since 1999, he has been the Director of the Data Coordinating Center in the School of Medicine. In 2008, he was appointed the caBIG deployment lead for the Stanford University Cancer Center. He is a leading expert in statistical computing, with particular emphasis on statistical tools for clinical trials and high throughput data. He has authored several R packages available on CRAN (http://cran.r-project.org) and R-forge (http://rforge.r-project.org). He is also the co-author of several widely used software packages such as Significance Analysis of Microarrays (SAM), Prediction Analysis of Microarrays (PAM) and Correlate that provide user-friendly Excel interfaces to R packages. He is the principal developer of the design and analysis software at the Center for Innovative Study Design at Stanford University.

3. Dose Finding Studies: Methods and Implementation (Half day) Frank Bretz and José Pinheiro Novartis Pharma AG and Johnson & Johnson Pharmaceutical Research and Development Abstract: Despite revolutionary advances in basic biomedical science, the number of new drug applications has been declining over the past several years. In response, different initiatives, such as the Critical Path Initiative, have been put in place to identify and propose ways to address the key drivers behind this pharmaceutical industry pipeline problem,. One well-known cause is poor dose selection for confirmatory trials resulting from inappropriate knowledge of dose response relationship (efficacy and safety) at the end of Phase II. This course will discuss the key statistical issues leading to the problems currently observed in dose finding studies, including a review of basic multiple comparisons and modeling methods, as traditionally used in these studies. A unified strategy for designing and analyzing dose finding studies, denoted MCPMod, combining multiple comparison and modeling, will be the major focus of the course. It will be discussed in detail, including a step-by-step description of its practical implementation. Case studies based on real clinical trials, together with software implemented in R, will be used to illustrate the use of the methodology. A practical motivated by a real dose finding study will allow attendees to get hands-on experience with the methods and the R software. 12

About the Instructors Dr. Frank Bretz received his Ph.D. in Statistics from the University of Hannover, Germany, in 1999. Afterwards, he worked for one year as biostatistician at Byk Gulden Pharmaceuticals, Konstanz, in the design and analysis of clinical trials. He then moved back to the University of Hannover as Assistant Professor. Frank finished in 2004 his post-doctoral thesis (“Habilitation”) at the Medical University of Hannover and joined in the same year the Statistical Methodology group at Novartis Parma, where is currently a Biometrical Fellow. He has supported the methodological development in various areas of drug development, including dose-finding, multiple comparisons, and adaptive designs. Since 2007 he is an Adjunct Professor at the Medical University of Hannover. Dr. Bretz is a core member of the PhRMA working group on “Adaptive Dose-Ranging Designs”. Since 2002 he is the treasurer of the German Region of the International Biometric Society. From 2004 until 2007 he was the head of the working group on “Statistical Methods in Bioinformatics” of the German Region / IBS. He is a codeveloper of the multcomp software in R for multiple comparison procedures in general linear models. He has authored or co-authored more than 50 articles in peer-reviewed journals and one book. Dr. José Pinheiro is a Senior Director in the Adaptive Designs group at Johnson & Johnson PRD. Prior to that he worked at Novartis Pharmaceuticals for eight years, most recently as a Senior Biometrical. He has been involved in the development and implementation of innovative statistical methods for early and late phase clinical trials across a wide range of therapeutic areas, such as oncology, neurosystems, cardiovascular, transplantation, respiratory diseases, and dermatology. He received his Ph.D. in Statistics from the University of Wisconsin – Madison in 1994 and did postdoctoral work for two years in the Biostatistics department at the same university. He is a past-chair of the Statistical Computing Section of ASA, having also served as chair of the Awards Committee for the same section. Since 2005, Dr. Pinheiro has served as a coleader of the PhRMA PISC working group on Adaptive Dose Ranging Studies and participated in the PhRMA PISC working group on Adaptive Designs. He is the vice-chair of the Biostatistics and Data Management Technical Group. He was the co-chair for the 2006 ENAR meeting of the International Biometric Society, having also served as Secretary of ENAR. He is an associated editor for the Biometrical Journal, Statistics in Biopharmaceutical Research, and Statistics in Biosciences. The author of a book on and the most widely used software in S-PLUS and R for mixed-effects models, eight book chapters, and over 50 refereed papers, Dr. Pinheiro has presented over 20 short courses and given over 80 invited presentations at conferences, government agencies, and universities around the world.

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Short  Courses   4. Group Sequential Methods in Clinical Trials (Half day) K. K. Gordon Lan Johnson & Johnson Pharmaceutical Research and Development About the Instructor Dr. K. K. Gordon Lan is Senior Director, Johnson & Johnson PRD. He received his Ph.D. in Mathematical Statistics from Columbia University in 1974. Before joining Johnson & Johnson in 2005 as Senior Director of Statistical Science, he held positions as Mathematical Statistician at the National Heart, Lung and Blood Institute (NHLBI/NIH), Professor of Statistics at George Washington University, Distinguished Scientist at Pfizer and Statistics Fellow at Sanofi-Aventis. Gordon has published more than 50 research papers in professional journals and has given more than 200 invited talks at universities and professional meetings. He is interested in statistical methods for clinical trial design and data analysis. Gordon was elected Fellow of the American Statistical Association in 1992 and Fellow of the Society for Clinical Trials in 2009.

5. Basic Concepts in Equivalence/Noninferiority Testing: Issues and Challenges (Half day) Tie-Hua Ng US Food and Drug Administration Abstract: The objective of a noninferiority (NI) trial is to show that the test treatment or the experimental treatment is not inferior to the standard therapy or the active control by a small margin known as the NI margin. This short course will elaborate the rationale of choosing the NI margin as a small fraction of the therapeutic effect of the active control as compared to placebo in testing of the NI hypothesis of the mean difference with a continuous outcome. This NI margin is closely related to M1 and M2, the NI margins discussed in the FDA draft guidance on NI clinical trials issued in March of 2010. For testing the NI hypothesis of the mean ratio with a continuous outcome, a similar NI margin on a log scale may be used. This approach may also be applied in testing of the NI hypotheses for survival data based on hazard ratios as well as in the testing of the NI hypothesis with binary endpoints based on the odds ratio. An example in the thrombolytic area will be used for illustration purposes. Unlike the superiority trials (e.g., placebo-control trials), a poorly conducted NI trial (e.g., mixing up treatment assignment) would diminish the treatment difference that may ICSA Applied Statistics Symposium 2011, NYC, June 26-29

exist and hence biases in favor of the test treatment. This is the fundamental issue in the NI trials. It is well recognized that multiplicity adjustment is not necessary in simultaneous testing for noninferiority and superiority. However, there will be more experimental treatments that are expected to have the same effect as the active control tested for superiority in simultaneous testing than would occur if only one null hypothesis is tested, thereby increasing erroneous claims of superiority. This leads to an increase in the false discovery rate for superiority. About the Instructor Dr. Ng received his Ph. D. Degree in Statistics in 1980 from the University of Iowa. He held several positions before joining the US Food and Drug Administration (FDA) in 1987. He left the FDA in 1990 to work for the Henry M. Jackson Foundation. In 1995, he returned to the FDA, Center for Biologics Evaluation and Research (CBER). He is currently a team leader supporting the Office of Blood Research and Review within CBER. His research interest includes equivalence/noninferiority testing and Bayesian approach. Over the past 18 years, he had made numerous presentations at professional meetings and published extensively in the area of active controlled/noninferiority studies. More specifically, his research focused on the determination of noninferiority (NI) margin and issues of simultaneous testing of NI and superiority. He first proposed that the NI margin should be a small fraction of the therapeutic effect of the active control as compared to placebo in his 1993 paper published in the Drug Information Journal. Subsequently, two follow-up papers were published --one in the Drug Information Journal in 2001 and other one in Statistics in Medicine in 2008. He raised the issues of simultaneous testing of NI and superiority in two of his papers published in the Journal of Biopharmaceutical Statistics in 2003 and 2007.

6. Strategies for Extracting Reliable Information from Megavariate Data (Half day) Dhammika Amaratunga Johnson & Johnson Pharmaceutical Research and Development Abstract: A spate of recent advances in genomics has significantly altered the way research is being conducted in biology and medicine. It is now possible to investigate the behavior of genes and proteins thousands at a time, a powerful resource for the biological researcher. Of the current technologies, the most prominent is the DNA microarray, which can be used to profile the expression patterns of tens of thousands of genes simultaneously. How to properly analyze and interpret the enormous amounts of data this type of technology generates remains a challenge as its high 13

Short  Courses   dimensional (megavariate) structure, comprising many variables but few samples, renders it vulnerable to over-fitting and over-interpretation. Development of methodology for analyzing megavariate data remains a work in progress but new techniques have been developed that are worthy of consideration. Ultimately, a multi-faceted approach is likely to be the most effective at extracting reliable information. For instance, for a standard well-designed comparative microarray experiment, a fairly rigorous prescription for determining a gene expression signature would include (1) a quality control step to settle any anomalies in the data and to ensure that the data indeed carry a signal, (2) an individual gene analysis to identify differentially expressed genes using a method that borrows strength across genes in a nonparametric way to increase efficiency, (3) an analysis of gene sets to identify affected biological processes and pathways, (4) an ensemble classification procedure to identify similarities and/or dissimilarities amongst the samples and the genes associated with any dissimilarities, (5) a procedure to integrate concomitant data to assess concurrence of findings. This course will introduce the issues underlying megavariate data analysis and will use actual case studies to review this multi-faceted approach. About the Instructor Dr. Dhammika Amaratunga is Senior Research Fellow in Nonclinical Biostatistics at Johnson & Johnson Pharmaceutical Research & Development. He has been involved in microarray data analysis since 1997, the early days of microarrays. He and his team have numerous publications, including a book, and they have also given numerous presentations and courses on this topic. He has also been actively involved in a number of professional committees, including ASA’s Committee on Award of Outstanding Application, PhRMA’s Statistics Expert Teams on Pharmacogenomics and Biomarker Qualification and is the Director for PERI's webinar series on Statistics in Genomics. He is a Fellow of the American Statistical Association. He has a B.Sc. from the University of Colombo, Sri Lanka, and a Ph.D. in Statistics from Princeton University, where he learnt the importance of careful exploratory data analysis while working under the supervision of John Tukey.

14

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Ldg`^c\Vii]ZheZZYd[a^[Z#=Zga^[Z#

With over 20 years of experience in clinical studies and a commitment to operational excellence, you won’t find a firm that’ll work smarter for you. Or faster for her. Because getting medicines to those in need is critical, it’s crucial you choose a functional outsourcing partner who shares your mission to reduce cycle time by efficiently improving quality. Call us today at 1-866-230-7992 and let us tell you more. Kforce Clinical Research. Ldg`^c\Vii]ZHeZZYd[A^[Z# HB

7.5 x 4.75

bridging talent & opportunity

Permanent Placement: Barbara Day 800-989-5627 [email protected] smithhanley.com

Contract Staffing: Mishka Symonette 800-684-9921 [email protected] smithhanleyconsulting.com

Statistics Biostatistics Modeling Epidemiology

Recruitment Specialists since 1980

Social  Programs   Opening Mixer Sunday June 26th, 2011 7 PM - 9 PM Atrium (9th Floor) Westin Times Square

Cruise Night Monday, June 27th, 2011 Enjoy a relaxed and refreshing 3-hour evening cruise (open to all participants and their families or friends) on Monday, June 27th. You'll circumnavigate Manhattan Island and see it all  3 rivers, 7 major bridges, 5 boroughs, over 25 world renowned landmarks and, of course, a magnificent close-up of the Statue of Liberty. Dinner and drinks will be included. $45 per person, and $30 for 20 years old or younger if register online in advance. Limited on-site registration are available at $60 and $40, respectively.

Banquet Tuesday, June 28, 2011 6:30pm-9:00pm, 10:00 PM  Bus boarding time (at Westin): 5:20pm Location: East Buffet and Restaurant, 42-07 Main Street, Flushing, NY 11355 Tel: 718-353-6333 Fax: 718-353-0628

Banquet Speech - Organizer: James Fu, Professor Emeritus, Department of Statistics, University of Manitoba, Canada - Chair: Xiaoli Meng, Whipple V. N. Jones Professor of Statistics, Department of Statistics, Harvard University

The early years of ICSA, 1968-1998; Dreams, Dreams George C. Tiao W. Allen Wallis Professor of Econometrics and Statistics (emeritus), University of Chicago Speaker biography  George C. Tiao was born in London in 1933. After graduating with a B.A. in Economics from National Taiwan University in 1955 he went to the US to obtain an M.B.A from New York University in 1958 and a Ph.D. in Economics from the University of Wisconsin, Madison in 1962. From 1962 to 1982 he was Assistant, Associate, Professor and Bascom Professor of Statistics and Business at the University of Wisconsin, Madison, and in the period 1973--1975 was Chairman of the Department of Statistics. He moved to the Graduate School of Business at the University of Chicago in 1982 and is the W. Allen Wallis Professor of Econometrics and Statistics (emeritus). George Tiao has played a leading role in the development of Bayesian Statistics, Time Series Analysis and Environmental Statistics. He is co-author, with G.E.P. Box, of Bayesian Inference in Statistical Analysis and is the developer of a model-based approach to seasonal adjustment (with S. C. Hillmer), of outlier analysis in time series (with I. Chang), and of new ways of vector ARMA model building (with R. S. Tsay). He is the author/co-author/co-editor of 7 books and over 120 articles in refereed econometric, environmental and statistical journals and has been thesis advisor of over 25 students. He is a leading figure in the development of Statistics in Taiwan and China and is the Founding President of the International Chinese Statistical Association 1987--1988 and the Founding Chair Editor of the journal Statistica Sinica 1988-1993. He played a leading role (over the 20 year period 1979-1999) in the organization of the annual NBER/NSF Time Series Workshop and he was a founding member of the annual conference "Making Statistics More Effective in Schools of Business" 1986--2006. (Quoted from: Statistical Science 2010, Vol. 25, No. 3, 408-428.)

15

 

Announcement   The 21st ICSA Applied Statistical Symposium will be held from June 23 (Saturday) to June 26 (Tuesday), 2012, in the Westin Boston Waterfront Hotel, located in the beautiful seaport district of Boston, Massachusetts. The conference will include short courses, technical presentations, student paper contests, and social events. Professor Bradley Efron of Stanford University, Professor Andrew Lo of MIT, and Dr. Richard Simon of National Cancer Institute will deliver keynote speeches. Additional keynote speakers in the area of Personalized Medicine and Finance may be added. Professor Shing-Tung Yau (邱成桐) of Harvard University will be our honorable banquet speaker. The symposium short courses will cover topics including Comparative Effectiveness, Next Generation Sequencing, Adaptive Design Implementation and Execution, Biomarker Methods, Non-parametric Bayesian Statistics, etc. Short course topics and presenters will be finalized in the coming months. Please contact Dr. Tianxi Cai ([email protected]) if you have any suggestions or questions about short courses. The symposium executive committee consists of the following members: Mingxiu Hu, Millennium/The Takeda Oncology Company, Co-Chair Tianxi Cai, Harvard University, Co-Chair and Program Committee Chair Hongliang Shi, Millennium/The Takeda Oncology Company, Secretary/Treasurer Minghui Chen, University of Connecticut, Advisor Naitee Ting, Boehringer-ingelheim, Fundraising Committee Chair Mark Chang, AMAG Pharmaceuticals, ISBS Representative

We are striving to make this conference a memorable and learning experience for all, and welcome all ICSA and ISBS current and future members to participate and to provide proposals and suggestions. Mingxiu Hu/Tianxi Cai (on behalf of 2012 Symposium Executive Committee)

Scientific Program (� Presenting Author)

Monday, June 27. 10:45 AM-12:25 PM

Scientific Program (June 27th - June 29th)

Monday, June 27, 8:00 AM-10:30 AM Conference opening session (Invited) Room: Majestic (I/II), 5th floor Organizers: ICSA 2011 organizing committee. Chair: Zhezhen Jin, Columbia University. 8:00 AM Welcome Keynote session I (Invited) Room: Majestic (I/II), 5th floor Organizers: ICSA 2011 organizing committee. Chair: Jianqing Fan, Princeton University.. 8:20 AM Keynote lecture I David Donoho. Stanford University 9:10 AM Floor Discussion. Keynote session II (Invited) Room: Majestic (I/II), 5th floor Organizers: ICSA 2011 organizing committee. Chair: Junfang Li, Mitsubishi Tanabe. 9:25 AM Keynote lecture II Ji Zhang. Sanofi-aventis U.S. LLC. 10:15 AM Floor Discussion.

Monday, June 27. 10:45 AM-12:25 PM Session 1: Statistical Challenges Arising from Design and Analysis of NIH Studies (Invited) Room: Pearl, 9th floor Organizers: Gang Zheng, National Heart Lung and Blood Institute; Colin Wu, National Heart Lung and Blood Institute. Chair: Colin Wu, National Heart Lung and Blood Institute.

10:50 AM Challenges Arising from National Heart, Lung, and Blood Institute Clinical Trials Nancy L. Geller. National Heart Lung and Blood Institute 11:13 AM Some Problems Arising in the Development and Evaluation of Risk Models Mitchell H. Gail. National Cancer Institute, DCEG

12:22 PM Floor Discussion.

Session 2: Statistical Challenges from Survey, Behavioral and Social Data (Invited) Room: Palace, 9th floor Organizer: Tian Zheng, Columbia University. Chair: Tian Zheng, Columbia University.

10:50 AM A Corpus from a Single Survey � Andrew Gelman, Michael Malecki, Vincent Dorie and Wei Wang. Columbia University 11:13 AM On the Stationary Distribution of an Incompatible Gibbs Sampler � Jingchen Liu1 , Andrew Gelman1 , Jennifer Hill2 and YuSung Su3 . 1 Columbia University 2 New York University 3 Tsinghua University 11:36 AM Statistical Methods in Search Engine Ranking Daryl Pregibon1 , � Rachel Schutt1 , Ni Wang1 and Yong Li2 . 1 Google Research 2 Google

Presenting.

11:59 AM Demographic Diversity on the Web M. Irmak Sirer1 , � Jake M. Hofman2 and Sharad Goel2 . 1 Northwestern University 2 Yahoo! Research 12:22 PM Floor Discussion.

Session 3: Enhancing Probability of Success Using Modeling & Simulation (Invited) Room: New Amsterdam, 9th floor Organizers: Devan Mehrotra, Merck & Co., Inc.; Weili He, Merck & Co., Inc.. Chair: Weili He, Merck & Co., Inc..

10:50 AM Enhancing the Design of a Dose Ranging Study Using Modeling and Simulation � Yaming Hang and Devan Mehrotra. Merck & Co., Inc. 11:20 AM Enhancing Probability of Selecting the Best Compound Vlad Dragalin. ADDPLAN, An Aptiv Solutions company 11:50 AM An “Exposure”-Response Modeling Approach to Support PoC Decision Making - A Case Study Chyi-Hung Hsu. Johnson & Johnson Pharmaceutical RD, L.L.C. 12:20 PM Floor Discussion.

11:36 AM Using Group Testing to Evaluate Gene-Environment Interaction � Aiyi Liu1 , Chunling Liu2 , Paul Albert1 and Zhiwei Zhang1 . 1 National Institute of Child Health & Human Development 2 The Hong Kong Polytechnic University

Session 4: Financial Statistics, Risk Management, and Rare Event Modeling (Invited)

11:59 AM A Hybrid Parametric and Empirical Likelihood Ratio Statistic for Testing Interaction between Covariates in CaseControl Studies. � Jing Qin1 , Hong Zhang2 , Maria Teresa Landi2 , Neil E. Caporas2 and Kai Yu2 . 1 National Institute of Allergy and Infectious Diseases 2 National Cancer Institute

10:50 AM Extreme Temperatures and CME Temperature Derivatives Debbie Dupuis. HEC Montr`eal

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Room: Royale, 9th floor Organizer: Zhengjun Zhang, University of Wisconsin-Madison. Chair: Chunming Zhang, University of Wisconsin-Madison.

11:13 AM The Connection Between the Logit Model, Normal Discriminant Analysis Model, and the Multivariate Normal Mixtures Weihu Cheng. Beijing University of Technology

17

Monday, June 27. 10:45 AM-12:25 PM 11:36 AM HYBRID-GARCH: A Generic Class of Models for Volatility Predictions Using Mixed Frequency Data Xilong Chen1 , Eric Ghysels2 and � Fangfang Wang3 . 1 SAS Institute Inc. 2 University of North Carolina at Chapel Hill 3 University of Illinois at Chicago 11:59 AM On the Estimation of Integrated Covariance Matrices of High Dimensional Diffusion Processes � Xinghua Zheng and Yingying Li. The Hong Kong University of Science and Technology 12:22 PM Floor Discussion.

Session 5: Innovative Drug Safety Graphics Presentation (Invited) Room: Imperial, 4th floor Organizers: Qi Jiang, Amgen Inc.; Mat Soukup, U.S. Food and Drug Administration. Chair: Qi Jiang, Amgen Inc.; Mat Soukup, U.S. Food and Drug Administration.

10:50 AM Wiki Resources for Improving Your Statistical Graphs Richard Forshee. U.S. Food and Drug Administration, CBER/OBE

Scientific Program (� Presenting Author) Session 7: High-Dimensional Feature Selection, Classification and Dynamic Modeling for Genetics Applications (Invited) Room: Plymouth, 9th floor Organizers: Hulin Wu, University of Rochester; Inchi Hu, The Hong Kong University of Science and Technology. Chair: Shaw-Hwa Lo, Columbia University.

10:50 AM A Mathematical Framework for Functional Mapping of Complex Systems Using Delay Differential Equations Guifang Fu, Zhong Wang, Jiahan Li and � Rongling Wu. The Pennsylvania State University 11:13 AM High-Dimensional Classification Using Influential MultiFactor Interactions Identified by SPV Algorithm � Jing-Shiang Hwang and Tsuey-Hwa Hu. Academia Sinica, Taiwan 11:36 AM A Classification Method Incorporating Interaction among Variables for High-Dimensional Data � Inchi Hu1 , Haitian Wang1 , Shaw-Hwa Lo2 and Tian Zheng2 . 1 The Hong Kong University of Science and Technology 2 Columbia University

11:13 AM Let Graphs Speak for You: Examples of Using Adverse Event Graphics for Safety Signal Detection Liping Huang. Roche Products Limited

11:59 AM High Dimensional ODEs Coupled with Mixed-Effects Modeling Techniques for Dynamic Gene Regulatory Network Identification Tao Lu1 , Hua Liang1 , Hongzhe Li2 and � Hulin Wu1 . 1 University of Rochester 2 University of Pennsylvania 12:22 PM Floor Discussion.

11:36 AM Organized and Effective Interpretation of Clinical Laboratory Data: Graphs Make a Difference Robert Gordon. Johnson & Johnson

Session 8: Recent Developments in Design and Analysis for High-Dimensional Data (Invited)

11:59 AM Graphical Presentations for ECG and Vitals Data � R.J. Anziano1 , R. Fiorentino2 , E. Frimpong2 , A. Paredes2 , P. Bridge3 and L. Huang3 . 1 Pfizer Inc. 2 U.S. Food and Drug Administration, CDER 3 Roche Products Limited 12:22 PM Floor Discussion.

Session 6: Topics in Statistical Machine Learning and High Dimensional Data Analysis (Invited) Room: Manhattan, 5th floor Organizers: Yufeng Liu, The University of North Carolina at Chapel Hill; Dacheng Liu, Boehringer Ingelheim Pharmaceuticals, Inc.. Chair: Dacheng Liu, Boehringer Ingelheim Pharmaceuticals, Inc..

10:50 AM Sparsity Inducing Credible Sets for High-Dimensional Variable Selection Howard Bondell. North Carolina State University 11:13 AM High-Dimensional Non-Linear Interaction Structures Peter Radchenko. University of Southern California 11:36 AM Adaptively Weighted Large Margin Classifiers � Yichao Wu1 and Yufeng Liu2 . 1 North Carolina State University 2 University of North Carolina at Chapel Hill 11:59 AM Parameter Estimation for Ordinary Differential Equations: An Alternative View on Penalty � Yun Li, Naisyin Wang and Ji Zhu. Department of Statistics, University of Michigan 12:22 PM Floor Discussion.

18

Room: Minskoff, 9th floor Organizer: Sijian Wang, University of Wisconsin-Madison. Chair: Qixuan Chen, Columbia University.

10:50 AM Estimating False Discovery Proportion under Arbitrary Covariance Dependence Jianqing Fan, � Xu Han and Weijie Gu. Princeton University 11:13 AM A Statistical Framework for Illumina DNA Methylation Arrays � Pei Fen Kuan1 , Sijian Wang2 , Xin Zhou1 and Haitao Chu3 . 1 University of North Carolina at Chapel Hill 2 University of Wisconsin-Madison 3 University of Minnesota-Twin Cities 11:36 AM Designs for the Lasso Xinwei Deng1 , � C. Devon Lin2 and Peter Z.G. Qian1 . 1 University of Wisconsin-Madison 2 Queen’s University 11:59 AM Graphical Model with Ordinal Variables � Jian Guo, Liza Levina, George Michailidis and Ji Zhu. University of Michigan 12:22 PM Floor Discussion.

Session 9: Pharmaceutical Safety (Invited)

Room: Nederlander, 9th floor Organizers: Greg Soon, U.S. Food and Drug Administration; Xiao Ding, U.S. Food and Drug Administration. Chair: Jiajun Liu, Merck & Co., Inc..

10:50 AM Graphic Display for Summarizing Individual Responses in Crossover Designed Human Abuse Potential Studies Ling Chen. U.S. Food and Drug Administration ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author) 11:20 AM Some Statistical Issues in a Safety Clinical Trial - Thorough QT/QTc Study Joanne Zhang. U.S. Food and Drug Administration 11:50 AM Meta-Analysis for Durg Safety Evaluation with Rare Outcomes � Xiao Ding and Mat Soukup. U.S. Food and Drug Administration 12:20 PM Floor Discussion.

Session 10: Lifetime Data Analysis (Invited)

Room: Gramercy, 4th floor Organizer: Mei-Ling Ting Lee, University of Maryland at College Park. Chair: Xin He, University of Maryland at College Park.

10:50 AM Two Criteria for Evaluating Risk Prediction Models � Ruth Pfeiffer and Mitchell H. Gail. National Cancer Institute 11:13 AM A Semiparametric Threshold Model for Censored Longitudinal Data Analysis � Jialiang Li and Wenyang Zhang. National University of Singapore 11:36 AM Analysis of Survival Data with Missing Censoring Indicators Gregg Dinse. The National Institute of Environmental Health Sciences 11:59 AM Connecting Threshold Regression and Accelerated Failure Time Models � Xin He1 , G. A. Whitmore2 and Mei-Ling Ting Lee1 . 1 University of Maryland 2 McGill University 12:22 PM Floor Discussion.

Session 11: High Dimensional Statistical Learning (Invited)

Room: Majestic (I/II), 5th floor Organizers: Yang Feng, Columbia University; Tian Zheng, Columbia University. Chair: Jinchi Lv, University of Southern California.

10:50 AM A ROAD to Classification in High Dimensional Space � Jianqing Fan1 , Yang Feng2 and Xin Tong1 . 1 Princeton University 2 Columbia University 11:13 AM SOFARE: Selection of Fixed and Random Effects in HighDimensional Longitudinal Data Analysis Yun Li1 , � Peter X. Song1 , Sijiang Wang2 and Ji Zhu1 . 1 University of Michigan 2 University of Wisconsin-Madison 11:36 AM The Screening and Ranking Algorithm to Detect DNA Copy Number Variations � Ning Hao1 , Yue Niu1 and Heping Zhang2 . 1 University of Arizona 2 Yale School of Public Health 11:59 AM Loss Adaptive Modified Penalty in Variable Selection Tengfei Li1 , � Yang Feng2 , Wen Yu1 , Zhiliang Ying2 and Hong Zhang3 . 1 Fudan University 2 Columbia University 3 University of Science and Technology of China 12:22 PM Floor Discussion. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Monday, June 27. 2:00 PM - 3:40 PM Session 12: Multiplicity Issues and Predictive Modeling of Enrollment in Clinical Trials (Invited) Room: Booth, 3rd floor Organizers: Xiaoxi Zhang, Pfizer Inc.; Juanmei Liu, MedTronic, Inc.. Chair: Xiaoxi Zhang, Pfizer Inc..

10:50 AM Multiplicity Problems Arising in Subgroup Analysis � Alex Dmitrienko and Brian Millen. Eli Lilly and Company 11:20 AM Mixture Gatekeeping Procedures with Clinical Trial Applications � Ajit Tamhane1 and Alex Dmitrienko2 . 1 Northwestern University 2 Eli Lilly and Company 11:50 AM Empirical Bayes on Accrual Prediction for Multicenter trials H. Nguyen,1 , W. Strawderman2 and � C. Yu1 . 1 Pfizer Inc. 2 Rutgers University 12:20 PM Floor Discussion.

Session 13: Statistical Issues in Late-Stage HIV Clinical Trials (Invited) Room: Belasco, 3rd floor Organizers: Cunshan Wang, Pfizer Inc.; Sara Hughes, ViiV HealthCare; Dan Meyer, Pfizer Inc.; and Christy Chuang-Stein, Pfizer Inc.. Chair: Cunshan Wang, Pfizer Inc..

10:50 AM Challenges in Designing and Analyzing Confirmatory HIV Trials Guoxing Soon. U.S. Food and Drug Administration 11:13 AM Design and Monitoring Benefits and Risks in HIV Clinical Trials Using Prediction Scott Evans. Harvard University 11:36 AM Design and Endpoints for HIV Trials in Antiretroviral Treatment Naive Patients Mike Wulfsohn, Jim Rooney and � Lijie Zhong. Gilead Sciences, Inc. 11:59 AM Missing Data: It Is Better to Prepare and Prevent than to Repair and Repent Sara Hughes. ViiV Healthcare 12:22 PM Floor Discussion.

Session 14: New Developments in Methods for the Analysis of Repeated Measures Data (Contributed) Room: Melville, 5th floor Chair: Huilin Li, New York University.

10:50 AM Analyzing Repeated Measures Semi-Continuous Data, with Application to an Alcohol Dependence Study � Lei Liu1 , Robert L. Strawderman2 , Bankole Johnson1 and John O’Quigley1 . 1 University of Virginia 2 Cornell University 11:08 AM Estimation for Single-Index Mixed Effects Models � 1 Liugen Xue1 and Zhen Pang0 . Beijing University of Technology 11:26 AM Improving the Convergence Rate in Mixed-Effects Models � Guangxiang Zhang and John J. Chen. State University of New York at Stony Brook

19

Scientific Program (� Presenting Author)

Monday, June 27. 2:00 PM - 3:40 PM 11:44 AM Estimating Multiple Treatment Effects Using Two-Phase Regression Estimators �

Cindy Yu1 , Jason Legg2 and Bin Liu1 . sity 2 Amgen Inc.

1

Iowa State Univer-

12:02 PM Mixed-Effects Models for Evaluating Cardiac Function and Treatment Effects �

Maiying Kong, Hyejeong Jang and Daniel J. Conklin. University of Louisville 12:20 PM Floor Discussion.

Monday, June 27. 2:00 PM - 3:40 PM Session 15: Statistical Methods in Biomarker Discovery (Invited) Room: Minskoff, 9th floor Organizer: Liansheng Tang, George Mason University. Chair: Junfang Li, Mitsubishi Tanabe.

2:05 PM Efficient Two-Stage Smoothing Estimation Methods in Semivarying Ordinary Differential Equation Models with Application to Influenza Dynamics �

Hongqi Xue1 , Arun Kumar2 and Hulin Wu1 . of Rochester 2 Trinity University

1

University

2:28 PM Kernel Estimation for Three Way ROC Surface �

Chenqging Wu1 , Liansheng Tang2 and Pang Du3 . Yale School of Public Health 2 George Mason University 3 Virginia Polytechnic Institute and State University 1

2:51 PM Semeparametirc Time-Dependent ROC Models for Evaluating the Prognosis Accuracy of Biomarkers � 2

Nan Hu1 and Xiao-Hua Zhou2 . University of Washington

1

University of Utah

3:14 PM Simultaneously Comparing Accuracy among Clustered Diagnostic Markers, with Applications to the BioCycle Study Liansheng Tang. George Mason University 3:37 PM Floor Discussion.

Session 16: Advancing Statistical Methodolgy for Handling Missing Data in Longitudinal Studies (Invited) Room: Nederlander, 9th floor Organizer: Joan Hu, Simon Fraser University. Chair: Joan Hu, Simon Fraser University. 2:05 PM Kernel Regression and Differential Equations Willard John Braun. University of Western Ontario 2:28 PM A Weighted Simulation-Based Estimator for Incomplete Longitudinal Data �

Liqun Wang and He Li. University of Manitoba

2:51 PM Testing and Interval Estimation for Two-Sample Survival Comparisons with Small Sample Sizes and Unequal Censoring �

Rui Wang, Stephen Lagakos and Robert Gray. School of Public Health

20

Harvard

3:14 PM A Comparison of Power Analysis Methods for Evaluating Effects of a Predictor on Slopes in Longitudinal Designs with Missing Data � Cuiling Wang, Charles B. Hall and Mimi Kim. Department of Epidemiology and Population Health, Albert Einstein College of Medicine of Yeshiva University 3:37 PM Floor Discussion.

Session 17: Developments and Applications of Models with Time-Varying Covariates or Coefficients (Invited) Room: Majestic II, 5th floor Organizers: Zhongxin (John) Zhang, Johnson & Johnson; Surya Mohanty, Johnson & Johnson. Chair: Michael Lee, Johnson & Johnson.

2:05 PM Some Applications of Time-Varying Covariates by U.S. Food and Drug Administration Reviewers John Lawrence. U.S. Food and Drug Administration 2:28 PM Time-Varying Covariate Adjustment in Time-to-Event Data Analysis Julia Wang. Johnson & Johnson 2:51 PM Graphical Presentation for the Cox Model with TimeVarying Covariates � 1 Urania Dafni1 and Dimitris Karlis2 . University of Athens 2 Athens University of Economics and Business 3:14 PM (�Student Paper Award) Varying Coefficient Models for Modeling Diffusion Tensors Along White Matter Fiber Bundles � Ying Yuan, Hongtu Zhu, Martin Styner, John H. Gilmore and J. S. Marron. University of North Carolina at Chapel Hill 3:37 PM Floor Discussion.

Session 18: Mixture Models (Invited)

Room: Palace, 9th floor Organizers: Yuejiao Cindy Fu, York University; Yongzhao Shao, New York University. Chair: Weiwen Miao, Haverford College.

2:05 PM A New Nuisance Parameter Elimination Method with Application to Unordered Homologous Chromosome Pairs Problem � 1 Pengfei Li1 and Jing Qin2 . University of Alberta 2 National Institute of Allergy and Infectious Diseases 2:28 PM Nonparametric Estimation in Multivariate Mixture Models Hsiao-Hsuan Wang1 , � Yuejiao Fu1 and Jing Qin2 . 1 York University 2 National Institute of Allergy and Infectious Diseases, Biostatistics Research Branch 2:51 PM On Exchangeability and Mixtures of Normals Xinxin Jiang. Suffolk University 3:14 PM Lack of Sufficiency in Mixture Problems Yongzhao Shao. New York University 3:37 PM Floor Discussion. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author) Session 19: Design and Analysis of Biomedical Studies (Invited) Room: Belasco, 3rd floor Organizers: Qiong Yang, Boston University; Yongzhao Shao, New York University. Chair: Wenbin Lu, North Carolina State University.

2:05 PM Application of Bayesian Methods with Singular Value Decomposition in Genome-Wide Association Study Soonil Kwon and � Xiuqing Guo. Cedars-Sinai Medical Center 2:28 PM Analyze Multivariate Phenotypes in Genetic Association Studies by Combining Univariate Association Tests Qiong Yang. Boston University 2:51 PM Sample Size Analysis for Pharmacogenetic Studies � Chi-hong Tseng1 and Yongzhao Shao2 . 1 University of California, Los Angeles 2 New York University

Monday, June 27. 2:00 PM - 3:40 PM 2:28 PM A Bayesian Model for Modeling Gene-Environment Interaction � Kai Yu1 and Faming Liang2 . 1 National Cancer Institute 2 Texas A&M University 2:51 PM Semi-parametric Pseudo-Maximum-Likelihood Estimation Exploiting Gene-Environment Independence for PopulationBased Case-Control Studies with Complex Sampling � Yan Li1 and Barry Graubard2 . 1 The University of Texas at Arlington 2 National Cancer Institute 3:14 PM Semiparametric Maximum Likelihood Methods for Estimating Genetic and Environmental Effects wtih Case-Control Mother-Child Pair Data � Jinbo Chen1 , Dongyu Lin1 and Hagit Hochner2 . 1 University of Pennsylvania 2 Hebrew University 3:37 PM Floor Discussion.

3:14 PM Estimating Transitional Probabilities of Disease Status in Longitudinal Studies with Two-Phase Sampling Sujuan Gao. Indiana University 3:37 PM Floor Discussion.

Session 22: Non-/Semi-Parametric Models for Complex Data

Session 20: Bridging and multi-regional clinical trials (Invited)

2:05 PM Threshold Estimation Based on a P-Value Framework � Atul Mallik1 , Bodhi Sen2 , Moulinath Banerjee1 and George Michailidis1 . 1 University of Michigan 2 Columbia University

2:05 PM Statistical Challenges and Lessons Learned from MultiRegional Trials Daphne T.Y. Lin. U.S. Food and Drug Administration, CDER/OB/Division of Biometrics IV

canceled An M-Theorem for Bundled Parameters, with Application to the Efficient Estimation in a Linear Model for Cencored Data � 1 Ying Ding1 and Bin Nan2 . Eli Lilly and Company 2 Department of Biostatistics, University of Michigan

2:35 PM Establishing Consistency Across All Regions in a MultiRegional Clinical Trial � Chin-Fu Hsiao1 , Hsiao-Hui Tsou1 , H.M. James Hung2 , Yue-Ming Chen1 , Wong-Shian Huang1 and Wan-jung 1 Chang1 . National Health Research Institutes, Taiwan 2 U.S. Food and Drug Administration

2:35 PM Consistent Model Selection for Marginal Generalized Additive Model for Correlated Data Lan Xue1 , Annie Qu2 and � Jianhui Zhou3 . 1 Oregon State University 2 University of Illinois at Urbana-Champaign 3 University of Virginia

Room: Pearl, 9th floor Organizer: Yi Tsong, U.S. Food and Drug Administration. Chair: Greg Soon, U.S. Food and Drug Administration.

3:05 PM Evaluation of Regional Treatment Effect � Yi Tsong1 , W-J Chang2 , Xiaoyu Dong3 and Hsiao-Hui 1 Tsou2 . U.S. Food and Drug Administration 2 National Health Research Institutes, Taiwan 3 University of Maryland Baltimore County 3:35 PM Floor Discussion.

Session 21: Novel Approaches to the Genetic Dissection of Complex Traits (Invited) Room: Plymouth, 9th floor Organizer: Jinbo Chen, Perelman School of Medicine at the University of Pennsylvania. Chair: John Daye, University of Pennsylvania.

2:05 PM Joint Analysis of Binary and Quantitative Traits with Missing Data � Gang Zheng1 , Colin O Wu1 , Wenhua Jiang2 , Jungnam Joo3 and Joao AC Lima2 . 1 National Heart Lung and Blood Institute 2 The Johns Hopkins University School of Medicine 3 National Cancer Center, Korea ICSA Applied Statistics Symposium 2011, NYC, June 26-29

(Invited) Room: Majestic I, 5th floor Organizer: Bin Nan, University of Michigan. Chair: Peter Song, University of Michigan.

3:05 PM Analysis of Disease Progression Data via Progressive MultiState Models under Nonignorable Inspection Processes Baojiang Chen1 , � Grace Yi2 and Richard Cook2 . 1 University of Nebraska Medical Center 2 University of Waterloo 3:35 PM Floor Discussion.

Session 23: Time to Event Data Analysis (Invited)

Room: Melville, 5th floor Organizer: Judy Huixia Wang, Department of Statistics, North Carolina State University. Chair: Jiajia Zhang, University of South Carolina.

2:05 PM Comparison of Two Crossing Hazard Rate Functions Peihua Qiu. University of Minnesota 2:28 PM Robust Parameter Estimation in a Semiparametric Model for Case-Control Studies � Jingjing Wu1 , Rohana Karunamuni2 and Biao Zhang3 . 1 University of Calgary 2 University of Alberta 3 The University of Toledo

21

Scientific Program (� Presenting Author)

Monday, June 27. 2:00 PM - 3:40 PM 2:51 PM Response Adaptive Randomization for Delayed Survival Outcome with a Short-term Outcome �

Mi-Ok Kim1 , Chunyan Liu1 and Jack Lee2 . 1 Cincinnati Children’s Hospital Medical Center 2 The University of Texas MD Anderson Cancer Center 3:14 PM The “Modified Covariate” Method for Detecting Interactions between a Treatment and a Large Number of Predictors �

Lu Tian, A. Alizadeh, A. Gentles and R. Tibshirani. Stanford University 3:37 PM Floor Discussion.

Session 24: Adaptive Design in Clinical Trials (Invited) Room: Gramercy, 4th floor Organizer: Cheng-Shiun Leu, Columbia University. Chair: Cheng-Shiun Leu, Columbia University.

2:05 PM Subset Selection for Comparative Clinical Selection Trials Cheng-Shiun Leu, Ying-Kuen Cheung and � Bruce Levin. Department of Biostatistics, Columbia University

Session 26: Challenges in Comparative Effectiveness Research (Invited) Room: Booth, 3rd floor Organizer: Qi Jiang, Amgen Inc.. Chair: Qi Jiang, Amgen Inc..

2:05 PM Reflections on Heterogeneity and Exchangeability in Comparative Effectiveness Research Demissie Alemayehu. Pfizer Inc. 2:28 PM Taking a More Global Perspective in Choosing from Amongst Multiple Research Designs in the Conduct of Comparative Effectiveness Research Martin J. Zagari. Amgen Inc. 2:51 PM Strengths and Limitations of Administrative Healthcare Databases in Comparative Effectiveness Research Jesse Berlin. Johnson & Johnson 3:14 PM Non-Inferiority Trials and Their Relation to Indirect and/or Mixed Treatment Comparisons Steven Snapinn. Amgen Inc. 3:37 PM Floor Discussion.

2:28 PM Inference Following Adaptive Biased Coin Designs Steve Coad. Queen Mary, University of London 2:51 PM On an Adaptive Procedure for Selecting among Bernoulli Populations Pinyuen Chen. Syracuse University 3:14 PM The Levin-Robbins-Leu Random Subset Size Selection Procedure �

Cheng-Shiun Leu and Bruce Levin. Columbia University

3:37 PM Floor Discussion.

Session 25: Model Selection and Its Application in Clinical Trial Design (Invited) Room: Imperial, 4th floor Organizer: Jie Tang, Pfizer Inc.. Chair: Jie Tang, Pfizer Inc..

2:05 PM A Non-Inferiority Trial Design without the Need for a Conventional Margin � 2

Xi Chen1 and Hamparsum Bozdogan2 . University of Tennessee, Knoxville

1

PharmClint Co.

2:28 PM Shifting Model and Its Application in Unified Clinical Trial Analysis Xi Chen1 and � Jie Tang2 .

1

PharmClint Co. 2 Pfizer Inc.

2:51 PM Time to Conclusion for Phase II Cancer Trials and Its Implication to Trial Design �

Ying Lu1 and Shenghua Kelly Fan2 . 1 VA Palo Alto Health Care System and Stanford University 2 California State University, East Bay 3:14 PM (�Student Paper Award) Exact Meta-Analysis Approach for the Common Odds Ratio of 2 × 2 Tables with Rare Events �

Dungang Liu, Regina Liu and Minge Xie. Rutgers University

3:37 PM Floor Discussion.

22

Session 27: Challenges and Developments in Survival Analysis (Contributed) Yang Feng, Room: New Amsterdam, 9th floor Columbia Chair: Hua Judy Zhong, New York University.

University

2:05 PM Extension of Cure Rate Model When Cured Is Partially Known � Yu Wu1 , Yong Lin2 , Shou-En Lu2 and Weichung J. Shih2 . 1 K&L Consulting Services Inc. 2 University of Medicine and Dentistry of New Jersey 2:28 PM A Semiparametric Transformation Cure Model for Interval Censoring � Man-Hua Chen1 and Chen-Hsin Chen2 . 1 Tamkang University 2 Academia Sinica, Taiwan 2:51 PM Insights on the Robust Variance Estimator Under RecurrentEvents Model � Hussein Al-Khalidi1 , Yili Hong2 , Thomas Fleming3 and 1 Terry Therneau4 . Duke University 2 Virginia Polytechnic Institute and State University 3 University of Washington 4 Mayo Clinic 3:14 PM Bayesian Transformation Models for Multivariate Survival Times � Mario de Castro1 , Ming-Hui Chen2 and Joseph G. Ibrahim3 . 1 University of Sao Paulo 2 University of Connecticut 3 University of North Carolina at Chapel Hill canceled A New Nonparametric Estimator for Survival Functions When Censoring Times Are Incomplete � 1 Chung Chang1 and Wei-Yann Tsai2 . Department of Applied Mathematics, National Sun Yat-sen University 2 Department of Biostatistics, Columbia University 3:37 PM Floor Discussion.

Session 28: Design and Analysis of Clinical Trials (Contributed) Room: Royale, 9th floor Chair: Zhaoling Meng, Sanofi Avertis U.S. LLC..

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author) 2:05 PM Comparison Study of Different Dose-Finding Designs for Multiple Graded Toxicities in Oncology � Monia Ezzalfani1 , Marie-C`ecile Ledeley1 and Sarah Zohar2 . 1 Institut Gustave Roussy 2 Institut national de la sant`e et de la recherche m`edicale, Paris 2:23 PM Challenges in Non-inferiority Clinical Studies for Medical Devices Xu Yan. U.S. Food and Drug Administration 2:41 PM Analysis of Multi-Regional Clinical Trials: Applying a TwoTier Procedure to Decision-Making by Individual Local Regulatory Authorities � Yunling Xu and Nelson Lu. U.S. Food and Drug Administration, CDRH 2:59 PM Impact of Unequal Sample Sizes on Evaluation of Treatment Difference with Discrete Endpoints � Jin Xu and G. Frank Liu. Merck & Co., Inc. 3:17 PM The Use of Bayesian Hierarchical Models in the MultiRegional Clinical Trial for Medical Devices Shiling Ruan. U.S. Food and Drug Administration 3:35 PM Floor Discussion.

Monday June 27. 3:50 PM - 5:30 PM Session 29: Law and Statistics (Invited)

Room: Nederlander, 9th floor Organizer: Gang Zheng, National Heart Lung and Blood Institute. Chair: Gang Zheng, National Heart Lung and Blood Institute.

3:55 PM An Analysis of the Statistical Summary Submitted by the Investment Bank in the SEC v. Goldman-Sachs Case: Did The Regulators Appreciate the Implications of the Data? Joseph Gastwirth. George Washington University 4:18 PM How Technology is (Rapidly) Expanding the Scope of the Law in Statistics Victoria Stodden. Columbia University 4:41 PM Perspectives on Meta-Analysis from the Avandia Cases � Michael O. Finkelstein1 and Bruce Levin2 . 1 Columbia Law School 2 Department of Biostatistics, Columbia University 5:04 PM Statistical Properties of Tests Used to Detect Disparate Impact in Discrimination Cases � Weiwen Miao1 and Joseph Gastwirth2 . 1 Haverford College 2 George Washington University 5:27 PM Floor Discussion.

Session 30:

Stochastic Root-Finding and Optimization

(Invited) Room: Royale, 9th floor Organizers: Hock Peng Chan, National University of Singapore; Inchi Hu, The Hong Kong University of Science and Technology. Chair: Remus Ho, City University of Hong Kong.

3:55 PM A Resampling-Based Stochastic Approximation Approach for Analysis of Large Geostatistical Data � Faming Liang, Yichen Cheng and Qifan Song. Texas A & M University ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Monday June 27. 3:50 PM - 5:30 PM 4:25 PM A Simple Baysian Approach to Multiple Change-Points � Haipeng Xing1 and Tze Leung Lai2 . 1 State University of New York at Stony Brook 2 Stanford University 4:55 PM A Coupling Spline Method for Stochastic Root-Finding � Kwok-Wah Ho1 and Inchi Hu2 . 1 The Chinese University of Hong Kong 2 The Hong Kong University of Science and Technology 5:25 PM Floor Discussion.

Session 31: Functional Data Analysis (Invited)

Room: Manhattan, 5th floor Organizers: Dawei Liu, Department of Biostatistics, University of Iowa; Ying Zhang, Department of Biostatistics, University of Iowa. Chair: Jiguo Cao, Simon Fraser University.

3:55 PM Quantitative Trait Loci Mapping with Differential Equation Models � Jiguo Cao1 and Rongling Wu2 . 1 Simon Fraser University 2 The Pennsylvania State University 4:25 PM Integrating Data Transformation in Principal Components Analysis � Mehdi Maadooliat1 , Jianhua Huang1 and Jianhua Hu2 . 1 Texas A&M University 2 The University of Texas MD Anderson Cancer Center 4:55 PM Local Tests for Identifying Anisotropic Diffusion Areas in Human Brain on DTI Tao Yu1 and � Chunming Zhang2 . 1 National University of Singapore 2 University of Wisconsin-Madison 5:25 PM Floor Discussion.

Session 32: Recent Developments and Future Prospective in Statistical Methods in Longitudinal Data (Invited) Room: Pearl, 9th floor Organizers: Julie Li, BlackRock, Inc.; Susan Wang, Boehringer Ingelheim Pharmaceuticals, Inc.. Chair: Julie Li, BlackRock, Inc..

3:55 PM Estimating Treatment Effects for Episodic Interventions Based on Response-Dependent Observation � Richard Cook, Meaghan Cuerden and Cecilia Cotton. University of Waterloo 4:25 PM Efficient Semiparametric Regression for Longitudinal Data with Nonparametric Covariance Estimation Yehua Li. University of Georgia 4:55 PM A Moving Average Cholesky Factor Model in Covariance Modeling for Longitudinal Data Weiping Zhang1 and � Chenlei Leng2 . 1 University of Science and Technology of China 2 National University of Singapore 5:25 PM Floor Discussion.

Session 33: High-Dimensional Inference in Biostatistical Applications (Invited) Room: Belasco, 3rd floor Organizer: X. Jessie Jeng, University of Pennslyvania. Chair: X. Jessie Jeng, University of Pennsylvania.

23

Scientific Program (� Presenting Author)

Monday June 27. 3:50 PM - 5:30 PM 3:55 PM Ultra Dimension Reduction via Asymptotic Independent Correlation Coefficients Treasa Q. Cui and � Zhengjun Zhang. University of Wisconsin-Madison 4:18 PM Dimension Reduction and Variable Selection for Censored Regression in Genomics Data � Wenbin Lu and Lexin Li. North Carolina State University 4:41 PM Generalized Thresholding Estimators for High-Dimensional Location Parameters � Min Zhang1 , Dabao Zhang1 and Martin T. Wells2 . 1 Purdue University 2 Cornell University 5:04 PM High-Dimensional Modeling of Data with Correlated Variables and Structural Constraints with Applications in Statistical Genomics Z. John Daye. University of Pennsylvania School of Medicine 5:27 PM Floor Discussion.

Session 34: Statistical Machine Learning (Invited) Room: Palace, 9th floor Organizer: Ji Zhu, University of Michigan. Chair: Yunpeng Zhao, University of Michigan.

3:55 PM Functional Additive Regression � Yingying Fan and Gareth James. California

University of Southern

4:18 PM Forward-Lasso with Adaptive Shrinkage Peter Radchenko and � Gareth James. University of Southern California 4:41 PM Pairwise Variable Selection for Classification � 1 Xingye Qiao1 , Yufeng Liu2 and J. S. Marron2 . State 2 University of New York at Binghamton University of North Carolina at Chapel Hill

4:55 PM Merck Experience since the Food and Drug Administration Guidance on Adaptive Design � Keaven Anderson, Weili He, Jerald Schindler and Yang Song. Merck Research Laboratories 5:25 PM Floor Discussion.

Session 36: Spatial Statistics in Bio-medical Applications (Invited) Room: Gramercy, 4th floor Organizer: Ji Meng Loh, AT&T Labs Research. Chair: Ji Meng Loh, AT&T Labs Research.

3:55 PM Partial Likelihood Analysis of Spatio-Temporal Processes Peter Diggle. Lancaster University 4:18 PM On Low-Rank Dynamic Space-Time Models for Large Georeferenced Data � Sudipto Banerjee1 , Andrew O. Finley2 , Rajarshi 1 Guhaniyogi1 and Qian Ren1 . University of Minnesota 2 Michigan State University 4:41 PM Correlated Prior Models for Hidden Grouping in Small Area AFT Survival � Andrew Lawson1 and Jiajia Zhang2 . 1 Medical University of South Carolina 2 University of Southern California 5:04 PM Burgers and Fried Chicken: Characterizing the Spatial Distribution of Fast Food Restaurants in New York City � Ji Meng Loh1 and Naa Oyo Kwate2 . 1 AT&T Labs Research 2 Rutgers University 5:27 PM Floor Discussion.

Session 37: Nonparametric Inference and Secondary Analysis in Genomwide Studies (Invited) Room: Majestic I, 5th floor Organizer: Yongzhao Shao, New York University. Chair: Zhezhen Jin, Columbia University.

3:55 PM Nonparametric Election Forensics Raul Jimenez. Universidad Carlos III de Madrid

5:04 PM Multiclass Probability Estimation via Large Margin Classifiers Yichao Wu1 , � Hao Helen Zhang1 and Yufeng Liu2 . 1 North Carolina State University 2 University of North Carolina at Chapel Hill 5:27 PM Floor Discussion.

4:18 PM Efficient Adaptively Weighted Analysis of Secondary Phenotypes in Case-Control Genome-wide Association Studies � Huilin Li1 and Mitchell H. Gail2 . 1 New York University 2 National Cancer Institute

Session 35: Adaptive Designs Post-FDA Guidance: Challenges and Solutions (Invited)

5:04 PM Non-Parametric Bayesian Techniques and Models for Community Identification � Jiqiang Guo, Alyson Wilson and Dan Nordman. Iowa State University 5:27 PM Floor Discussion.

Room: Majestic II, 5th floor Organizer: Jose Pinheiro, Johnson & Johnson. Chair: Shiferaw Mariam, Johnson & Johnson.

3:55 PM Response-Adaptive Dose-Finding under Model Uncertainty � Frank Bretz1 , Bjorn Bornkamp1 , Holger Dette2 and Jose Pinheiro3 . 1 Novartis Pharmaceuticals Corporation 2 RuhrUniversity Bochum 3 Johnson & Johnson 4:25 PM Bayesian Response-Adaptive Dose-Ranging Studies: Design and Implementation Challenges � Michael Krams and Jose Pinheiro. Johnson & Johnson Pharmaceutical RD, L.L.C.

24

4:41 PM Non-Parametric Estimation of Surface Integrals Raul Jimenez1 and � Joe Yukich2 . 1 Universidad Carlos III de Madrid 2 Lehigh University

Session 38: Manufacturing and Quality Assessment (Invited) Room: Imperial, 4th floor Organizer: Yi Tsong, U.S. Food and Drug Administration. Chair: Xiao Ding, U.S. Food and Drug Administration.

3:55 PM Development of Content Uniformity Test for Large Sample Sizes Using Counting Method � Meiyu Shen and Yi Tsong. U.S. Food and Drug Administration ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author) 4:25 PM Tolerance Interval Approaches and Hypothesis Testing for Pharmaceutical Quality Assessment Yi Tsong1 , Xiaoyu Dong2 , Meiyu Shen1 and � Jinglin Zhong1 . 1 U.S. Food and Drug Administration 2 University of Maryland Baltimore County 4:55 PM Development of Sample Size Dependent Dose Content Uniformity Specification � Xiaoyu Dong1 , Yi Tsong2 , Meiyu Shen2 and Jinglin Zhong2 . 1 University of Maryland Baltimore County 2 U.S. Food and Drug Administration, CDER/OB/Division of Biometrics VI 5:25 PM Floor Discussion.

Monday June 27. 3:50 PM - 5:30 PM Session 82: Handling Heaping (Invited)

Room: Melville, 5th floor Organizer: Daniel F. Heitjan, University of Pennsylvania. Chair: Yimei Li, University of Pennsylvania.

3:55 PM Accounting for Heaping in Retrospectively Reported Event Data - A Mixture-Model Approach � Haim Y. Bar and Dean R. Lillard. Cornell University 4:18 PM Modeling Heaping in Self-Reported Longitudinal Cigarette Count Data � Hao Wang1 and Daniel F. Heitjan2 . 1 The Johns Hopkins University 2 University of Pennsylvania

Session 39: Next Generation Pharmacovigilance: Methodological and Policy Challenges (Invited)

4:41 PM Non-Parametric Estimation of the Reporting Mechanism from Precise and Heaped Self-Report Data � Sandra D. Griffith1 , Saul Shiffman2 and Daniel F. Heitjan1 . 1 University of Pennsylvania 2 University of Pittsburgh

3:55 PM Approaches to Improving Safety Monitoring of Biologic Products at Food and Drug Administration/CBER Robert Ball. U.S. Food and Drug Administration, CBER/OBE

5:04 PM Model-Based Analysis of Heaped Longitudinal Cigarette Count Data in Smoking Cessation Trials Sandra Griffith1 , � Daniel F. Heitjan1 , Yimei Li2 , Hao Wang3 and E. Paul Wileyto1 . 1 University of Pennsylvania 2 The Children’s Hospital of Philadelphia 3 The Johns Hopkins University 5:27 PM Floor Discussion.

Room: Plymouth, 9th floor Organizer: Zhezhen Jin, Columbia University. Chair: Mengling Liu, New York University.

4:25 PM 4:18 PM

Use of Geographic Variation in Comparative Effectiveness and Pharmacovigilance Studies � Mary Beth Landrum1 , Frank Yoon1 , Elizabeth Lamont1 , Ellen Meara2 , Amitabh Chandra3 and Nancy Keating1 . 1 Harvard Medical School 2 Dartmouth Medical School 3 Harvard Kennedy School

4:41 PM Robert Davis 5:04 PM 4:55 PM A Statistical Algorithm for Adverse Event Identification Discussion

Marianthi Markatou. IBM Thomas J. Watson Research Center and Cornell University 5:25 PM Floor Discussion.

Session 40: Statistics in Drug Discovery and Early Development (Invited) Room: Booth, 3rd floor Organizer: Donghui Zhang, Sanofi-Aventis U.S. LLC.. Chair: Donghui Zhang, Sanofi-Aventis U.S. LLC..

3:55 PM Strategies and Tools for Hit Selection in High Throughput Screening Andy Liaw. Merck & Co., Inc. 4:18 PM Robust Small Sample Inference for Fixed Effects in General Gaussian Linear Models � Chunpeng Fan1 , Donghui Zhang1 and Cun-Hui Zhang2 . 1 Sanofi-Aventis U.S. Inc. 2 Rutgers University 4:41 PM Validation of Cell Based Image Features as Predictive Safety Biomarkers for DILI � Donghui Zhang and Chunpeng Fan. Sanofi-Aventis U.S. Inc. 5:04 PM PopPK Modeling for Oncology Drug Vorinostat with Doptimal Design Guided Sparse Sampling � Xiaoli Hou1 , Comisar Wendy1 , Nancy Agrawal1 and Bo Jin2 . 1 Merck & Co., Inc. 2 Pfizer Inc. 5:27 PM Floor Discussion. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Session 41: Large Scale Data Analysis and Dimension Reduction Techniques in Regression Models (Contributed) Room: New Amsterdam, 9th floor Chair: Yongchao Ge, Mount Sinai School of Medicine.

3:55 PM A Note on Sliced Inverse Regression with Missing Predictors � 1 Yuexiao Dong1 and Liping Zhu2 . Temple University 2 Shanghai University of Finance and Economics 4:13 PM Functional Mapping with Robust t-Distribution � Cen Wu and Yuehua Cui. Department of Statistics and Probability, Michigan State University 4:31 PM Sufficient Dimension Reduction Based on Hellinger Integral Xiangrong Yin1 , Frank Critchley2 and � Qin Wang3 . 1 University of Georgia 2 The Open University 3 Virginia Commonwealth University 4:49 PM b-bit Minwise Hashing for Ultra-Large Scale Data Analysis � 1 Ping Li1 and Christian Konig2 . Cornell University 2 Microsoft Research 5:07 PM Hessian Inverse Transformation for Dimension Reduction Heng-Hui Lue. Tunghai University 5:25 PM Floor Discussion.

Session 42: New Approaches for Design and Estimation Issues in Clinical Trials (Contributed) Room: Minskoff, 9th floor Chair: Gang Li, Johnson & Johnson Pharmaceutical R&D U.S..

3:55 PM Mixed Effect Models in Cross-over Studies Fei Wang. Boehringer Ingelheim Pharmaceuticals 4:13 PM A Comparison of Methods for Pretest-Posttest Trials with Small Samples � Xiao Sun and Devan Mehrotra. Merck & Co., Inc.

25

11:14 AM Eric Tchetgen Tchetgen, Harvard School of Public Health

Scientific Program (� Presenting Author)

Tuesday, June 28. 10:00 AM - 11:40 AM

4:31 PM Multiplicity Issues in Clinical Trials with Co-primary End- 10:05 AM points, Secondary Endpoints and Multiple Dose Comparisons Haiyan Xu. Johnson & Johnson Pharmaceutical RD, L.L.C. 10:28AM 10:35 AM

Statistical Methods for Rare Variant Association Testing for Sequencing Data Xihong Lin. Harvard School of Public Health

A Shared-Association Model for Genetic Association Studies with Outcome Stratified Samples 4:49 PM Use of the Average Baseline versus the Time-Matched Base� Colin O. Wu, Gang Zheng and Minjung Kwak. National line in Parallel Group thorough QT/QTc Studies Heart Lung and Blood Institute � 1 1 1 1 Zhaoling Meng , Li Fan , Hui Quan , Robert Kringle and 2 1 2 11:05 AM AM Bayesian Quantitative Trait Loci Mapping for Gene-Gene Gordon Sun . Sanofi-Aventis U.S. Inc. Celgene Corpora-10:51 and Gene-Environment Interactions tion Fei Zou. University of North Carolina at Chapel Hill 5:07 PM Sample Size and Power Calculation for Poisson Data in Clin11:35 AM AM Floor Discussion. 11:37 ical Trials with Flexible Treatment Duration � Lin Wang and Lynn Wei. Sanofi-Aventis U.S. Inc. Session 45: High Dimensional Statistics in Genomics (Invited) 5:25 PM Floor Discussion. Room: Melville, 5th floor Organizer: Hongzhe Li, University of Pennsylvania. Tuesday, June 28. 8:45 AM - 9:45 AM Chair: Hongzhe Li, University of Pennsylvania. Keynote session III (Invited) Room: Majestic (I/II), 5th floor Organizers: ICSA 2011 organizing committee. Chair: Xihong Lin, Harvard School of Public Health.. 8:45 AM Keynote lecture III Danyu Lin. University of North Carolina at Chapel Hill 9:35 AM Floor Discussion.

Tuesday, June 28. 10:00 AM - 11:40 AM Session 43: Recent Development in Multivariate Survival Data Analysis (Invited) Room: Booth, 3rd floor Organizer: Wenqing He, University of Western Ontario. Chair: Edmund Luo, Merck & Co., Inc..

10:05 AM Analysis of Recurrent Episodes Data: the Length-Frequency Tradeoff Jason Fine. University of North Carolina at Chapel Hill 10:28 AM Projecting Population Risk with Time-to-Event Outcome � Dandan Liu, Li Hsu and Yingye Zheng. Fred Hutchinson Cancer Research Center 10:51 AM Methods of Analyzing Bivariate Survival Data with Interval Sampling from Population Based Cancer Registry � Hong Zhu1 and Mei-Cheng Wang2 . 1 The Ohio State University 2 The Johns Hopkins University 11:14 AM Robust Working Models in Survival Analysis of Randomized Trials Jane Paik. Stanford University 11:37 AM Floor Discussion.

10:05 AM Use of Orthogonal Statistics in Testing for GeneEnvironment Interactions � James Dai, Charles Kooperberg, Michael LeBlanc and Ross Prentice. Fred Hutchinson Cancer Research Center 10:35 AM Multiple Testing of Local Maxima for Peak Detection in ChIP-Seq Yulia Gavrilov1 , Clifford A. Meyer2 and � Armin Schwartzman3 . 1 Tel Aviv University 2 Dana-Farber Cancer Institute 3 Harvard School of Public Health 11:05 AM eQTL Mapping Using RNA-seq Wei Sun. University of North Carolina at Chapel Hill 11:35 AM Floor Discussion.

Session 46: Biomarker Discovery and Individualized Medicine (Invited) Room: Minskoff, 9th floor Organizer: Hongshik Ahn, Department of Applied Math and Statistics, State University of New York at Stony Brook. Chair: Hongshik Ahn, State University of New York at Stony Brook.

10:05 AM Selective Voting in Convex-Hull Ensembles Improves Classification Accuracy � Ralph L. Kodell, Chuanlei Zhang, Eric R. Siegel and Radhakrishnan Nagarajan. University of Arkansas for Medical Sciences 10:35 AM Quantitative Analysis of Genome-wide Chromatin Remodeling � Songjoon Baek, Myong-Hee Sung and Gordon L. Hager. National Cancer Institute, CCR, LRBGE 11:05 AM Comparative Genetic Pathway Analysis Xiao Wu. State University of New York at Stony Brook 11:35 AM Floor Discussion.

Session 47: Applications in Spatial Statistics (Invited) Session 44: Interface Between Nonparametric and Semiparametric Analysis and Genetic Epidemiology (Invited) Room: Palace, 9th floor Organizers: Yuanjia Wang, Columbia University; Naihua Duan, Columbia University. Chair: Yuanjia Wang, Columbia University.

26

Room: Plymouth, 9th floor Organizer: Zhengyuan Zhu, Iowa State University. Chair: Zhengyuan Zhu, Iowa State University.

10:05 AM Modeling the Spread of Plant Disease Using a Sequence of Binary Random Fields with Absorbing States Mark S. Kaiser. Iowa State University ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author) 10:28 AM Autologistic Models for Binary Data on a Lattice � John Hughes1 , Murali Haran2 and Petrutza Caragea3 . 1 University of Minnesota 2 The Pennsylvania State University 3 Iowa State University 10:51 AM Theory and Practice for Massive Spatial Data Hao Zhang. Purdue University 11:14 AM Non-Parametric Estimation of Spatial Covariance Function � Zhengyuan Zhu and Li Yang. Iowa State University 11:37 AM Floor Discussion.

Session 48: The Totality of Evidence in Safety and Efficacy Evaluation of Medical Products (Invited) Room: Nederlander, 9th floor Organizers: Qian Li, U.S. Food and Drug Administration; Greg Soon, U.S. Food and Drug Administration. Chair: Yijie Zhou, Merck & Co., Inc..

10:05 AM Collective Evidence Qian Li. U.S. Food and Drug Administration 10:28 10:35 AM Adaptive Statistical Methods for Control of Type I Error Rate for Both Multiple Primary and Secondary Endpoints. � 10:51 AM Abdul J Sankoh and Haihong Li. Vertex Pharmaceuticals Inc. 11:14 11:05 AM Discussant: Greg Soon, U.S. Food and Drug Administration 11:35 AM Floor Discussion. 11:37

Collective Evidence in Medical Devices. Gregory Campbell. FDA

Session 49: Estimating Treatment Effects in Randomized Clinical Trials with Non-compliance and Missing Outcomes (Invited) Room: Manhattan, 5th floor Organizers: Greg Soon, U.S. Food and Drug Administration; Yan Zhou, U.S. Food and Drug Administration. Chair: Yijie Zhou, Merck & Co., Inc..

10:05 AM Analyzing Randomized Clinical Trials with NonCompliance and Non-ignorable Missing Data � Yan Zhou1 , Roderick J.A. Little2 and John D. Kalbfleisch2 . 1 U.S. Food and Drug Administration, CDER 2 Department of Biostatistics, University of Michigan 10:35 AM Handling Incomplete Data in Vaccine Clinical Trials � Ivan S.F. Chan, Xiaoming Li, William W.B. Wang and Frank G.H. Liu. Merck Research Laboratories 11:05 AM Discussant: Tom Permutt, U.S. Food and Drug Administration 11:35 AM Floor Discussion.

Session 50: Analysis of Biased Survival Data (Invited)

Room: Belasco, 3rd floor Organizer: Xiaodong Luo, Mount Sinai School of Medicine. Chair: Xiaodong Luo, Mount Sinai School of Medicine.

10:05 AM Semi-Parametric Modelling for Length-Biased Data � Yu Shen1 , Jing Ning2 and Jing Qin3 . 1 The University of Texas MD Anderson Cancer Center 2 The University of Texas School of Public Health 3 National Institutes of Health ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Tuesday, June 28. 10:00 AM - 11:40 AM 10:28 AM Nonparametric Estimation for Right-Truncation Data: a Pseudo-Partial Likelihood Approach Wei-Yann Tsai1 , Kam-Fai Wong2 and � Yi-Hsuan Tu3 . 1 Department of Biostatistics, Columbia University 2 Institute of Statistics, National University of Kaohsiung 3 Department of Statistics, National Cheng Kung University 10:51 AM Kernel Density Estimation with Doubly Truncated Data ` Carla Moreira and � Jacobo de U˜na-Alvarez. University of Vigo 11:14 AM Various Inferences from the Forward and Backward Recurrence Times in a Prevalent Cohort Study with Follow-up � David Wolfson1 , Vittorio Addona2 , Masoud Asgharian1 and Juli Atherton1 . 1 McGill University 2 Macalster College 11:37 AM Floor Discussion.

Session 51: Suicide Research Methodology (Invited) Room: Gramercy, 4th floor Organizer: Naihua Duan, Columbia University. Chair: Naihua Duan, Columbia University.

10:05 AM Methodological Issues in Suicide Research in Mainland China Michael R. Phillips. Shanghai Mental Health Center, Shanghai Jiao Tong University 10:28 AM Predicting Short Term and Long Term Risk of Suicide Attempt after a Major Depressive Episode Hanga Galfalvy. Department of Psychiatry, Columbia University 10:51 AM The Controversy about Antiepileptic Drugs and Suicide: Asking the Right Question and Addressing Confounding by Multiple Indications � Sue M. Marcus1 and Robert D. Gibbons2 . 1 Columbia University 2 University of Chicago 11:14 AM Discussant: Tzu-Cheg Kao, Uniformed Services University of the Health Sciences 11:37 AM Floor Discussion.

Session 52: Joint Modeling of Longitudinal and Time-toEvent Data in Medical Research (Invited) Room: Pearl, 9th floor Organizer: Wei Shen, Eli Lilly and Company. Chair: Wei Shen, Eli Lilly and Company.

10:05 AM Joint Analysis of Longitudinal Growth and Interval Censored Mortality Data � Darby Thompson1 , Charmaine Dean1 , Terry Lee2 and Leilei Zeng3 . 1 Simon Fraser University 2 St. Paul’s Hospital 3 University of Waterloo 10:28 AM Robust Inference for Longitudinal Data Analysis with NonIgnorable and Non-Monotonic Missing Values Chi-Hong Tseng1 , � Robert Elashoff1 , Ning Li0 and Gang Li1 . 1 University of California, Los Angeles 10:51 AM An Exploration of Fixed and Random Effects Selection for Longitudinal Binary Outcomes in the Presence of NonIgnorable Dropout � Ning Li1 , Michael Daniels2 , Gang Li3 and Robert Elashoff3 . 1 Cedars-Sinai Medical Center 2 University of Florida 3 University of California, Los Angeles

27

Tuesday, June 28. 1:30 PM - 3:10 PM 11:14 AM One Scenario in Which Joint Modeling Is Unnecessary � Joel A Dubin1 and Xiaoqin Xiong2 . 1 University of Waterloo 2 Information Management Services, Inc. 11:37 AM Floor Discussion.

Session 53: Survey Research Method and Its Application in Public Health (Invited) Room: Imperial, 4th floor Organizer: Huilin Li, New York University. Chair: Huilin Li, New York University.

10:05 AM The Use of Multiple Imputation to Adjust for Biases in Estimating the Long-Term Trend of Lung Cancer Incidence by Histologic Type from Population Based Data � Mandi Yu, Eric J. (Rocky) Feuer, Kathleen Cronin and Neil Caporaso. National Cancer Institute 10:28 AM Best Predictive Small Area Estimation Jiming Jiang1 , � Thuan Nguyen2 and J. Sunil Rao3 . 1 University of California, Davis 2 Oregon Health Sciences University 3 University of Miami 10:51 AM Identifying Implausible Gestational Ages with Reversible Jump MCMC � Guangyu Zhang1 , Nathaniel Schenker1 , Jennifer D. Parker1 and Dan Liao2 . 1 National Center for Health Statistics 2 University of Maryland 11:14 AM Efficient Analysis of Case-Control Studies with Sample Weights � Victoria Landsman and Barry Ira Graubard. National Cancer Institute 11:37 AM Floor Discussion.

Session 54: J P Hsu Memorial Session (Invited)

Room: Majestic (I/II), 5th floor Organizer: Karl E. Peace, Jiann-Ping Hsu College of Public Health, Georgia Southern University. Chair: Lili Yu, Jiann-Ping Hsu College of Public Health, Georgia Southern University.

10:05 AM Combined Estimation of Treatment Effect Under a Discrete Random Effects Model � K. K. Gordon Lan, Jose Pinheiro. Johnson & Johnson 10:28 AM A New Approach to Margin Specification and Testing of Associated Hypotheses � George Y.H. Chi1 , Gang Li2 . 1 Johnson & Johnson Pharmaceutical R&D, L.L.C. 2 Johnson & Johnson LifeScan 10:51 AM Targeted Maximum Likelihood Estimation: Estimatoin of causal effects in observational and experimental studies � Mark van der Laan, Ori Stittelman. University of California, Berkeley 11:14 AM (�Student Paper Award) Estimating Subject-Specific Treatment Differences for Risk-Benefit Assessment with Competing Risk Event-Time Data � Brian Claggett1 , Lihui Zhao1 , Lu Tian2 , Davide Castagno3 1 and Lee-Jen Wei1 . Harvard School of Public Health 2 Stanford University School of Medicine 3 Brigham and Women’s Hospital and University of Turin

28

Scientific Program (� Presenting Author) 11:37 AM Floor Discussion.

Session 55: Statistics in Environmental, Financial and Socical Science (Contributed) Room: New Amsterdam, 9th floor Chair: Tyler McCormick, Columbia University.

10:05 AM Cox Model with Point-Impact Predictor: Numerical Aspect � Yulei Zhang and Ian W. McKeague. Columbia University 10:28 AM Modular Latent Structure Analysis Tianwei Yu. Emory University 10:51 AM Efficient Semiparametric GARCH Modelling of Financial Volatility Li Wang1 , Cong Feng1 , � Qiongxia Song2 and Lijian Yang3 . 1 University of Georgia 2 The University of Texas at Dallas 3 Michigan State University 11:14 AM Respiratory Protective Equipment, Health Outcomes, and Predictors of Mask Usage among World Trade Center Terrorist Attack Rescue and Recovery Workers Vinicius C. Antao1 , � L. L`azlo Pallos1 , Youn Shim1 , James H. Sapp II1 , Robert M. Brackbill1 , James E. Cone2 and Steven D. Stellman2 . 1 Agency For Toxic Substances and Disease Registry 2 New York City Department of Health 11:37 AM Floor Discussion.

Session 56: Meta Analysis and Evidence from Large Scale Studies (Contributed) Room: Royale, 9th floor Chair: Chunpeng Fang, Sanofi Avertis.

10:05 AM Dynamics of Long-Term Imatinib Treatment Response � Min Tang1 , Mithat Gonen2 , Chani Field3 , Timothy P. Hughes3 , Susan Branford3 and Franziska Michor1 . 1 Harvard University and Dana-Farber Cancer Institute 2 Memorial Sloan-Kettering Cancer Center 3 The University of Adelaide 10:23 AM Using Probability Model to Evaluate Long Term Effect of FOBT in Colon Cancer Screening Dongfeng Wu. University of Louisville 10:41 AM Effect of Selective Serotonin Reuptake Inhibitors on Risk of Fractures: a Meta-analysis of Observational Studies � Qing Wu1 , Angelle F. Bencaz2 , Joseph G. Hentz1 and Michael D. Crowell3 . 1 Biostatistics, Mayo Clinic in Arizona 2 School of Medicine, Louisiana State University Health Sciences Center 3 College of Medicine, Mayo Clinic in Arizona 10:59 AM A Method to Estimate the Standard Error in Calculation of a Confidence Interval with Square-Root Transformed Data and Its Application to Meta Analysis in Clinical Trials � Jiangming Johnny Wu and Hanzhe Ray Zheng. Merck Research Laboratories 11:17 AM Phenotype-Specific Gene Co-Expression Detection and Validation � Cuilan Gao and Cheng Cheng. St. Jude Childrens Research Hospital 11:35 AM Floor Discussion. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author)

Tuesday, June 28. 1:30 PM - 3:10 PM Session 57: Semiparametric Models with Application in Biosciences (Invited) Room: Minskoff, 9th floor Organizers: Hua Liang, University of Rochester; Dacheng Liu, Boehringer Ingelheim Pharmaceutical, Inc.. Chair: Hua Liang, University of Rochester.

1:35 PM Variable Selection in Partly Linear Censored Regression Model Shuangge Ma1 and � Pang Du2 . 1 Yale University 2 Virginia Polytechnic Institute and State University 1:58 PM Semiparametric Estimation and Variable Selection for Longitudinal Surveys � Lily Wang1 and Suojin Wang2 . 1 University of Georgia 2 Texas A&M University 2:21 PM Semi-parametric Hybrid Empirical Likelihood Inference for Two-sample Comparison With Censored Data � Haiyan Su1 , Mai Zhou2 and Hua Liang3 . 1 Montclair State University 2 University of Kentucky 3 University of Rochester 2:44 PM Identification of Breast Cancer Prognosis Markers via Integrative Analysis � Shuangge Ma and Ying Dai. Yale University 3:07 PM Floor Discussion.

Session 58: Statisticsl Methods for Anaqlysis of Next Generation Sequences Data (Invited) Room: Pearl, 9th floor Organizer: Hongzhe Li, University of Pennsylvania. Chair: Hongzhe Li, University of Pennsylvania.

1:35 PM From Epigenetic Profiling to Understanding Transcription Regulatory Mechanism Xiaole Shirley Liu. Dana-Farber Cancer Institute and Harvard School of Public Health 1:58 PM Robust Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Tony Cai, � X Jessie Jeng and Hongzhe Li. University of Pennsylvania 2:21 PM Statistical Modeling of Multi-reads in ChIP-Seq Analysis � Dongjun Chung1 , Pei Fen Kuan2 , Bo Li3 , Rajendran Sanalkumar4 , Kun Liang1 , Emery H. Bresnick4 , Colin Dewey3 and Sunduz Keles1 . 1 Department of Statistics, University of Wisconsin-Madison 2 Department of Biostatistics, University of North Carolina at Chapel Hill 3 Department of Computer Science, University of Wisconsin-Madison 4 Department of Stem Cell and Regenerative Biology, University of Wisconsin-Madison 2:44 PM Modeling Intensity data from ABI/SOLiD second generation sequencing � Hao Wu1 , Rafeal Irizarry2 and H´ector Corrada Bravo3 . 1 Emory University 2 The Johns Hopkins University 3 University of Maryland 3:07 PM Floor Discussion. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Tuesday, June 28. 1:30 PM - 3:10 PM Session 59: Network and Related Topics (Invited) Room: Nederlander, 9th floor Organizer: Jiashun Jin, Carnegie Mellon University. Chair: Jiashun Jin, Carnegie Mellon University.

1:35 PM Some Recent Advances in Compressed Counting 1 Ping Li1 and � Cun-Hui Zhang2 . Cornell University 2 Rutgers University 1:58 PM Quantifying Uncertainty in Network Summary Statistics � Eric Kolaczyk and Wes Viles. Boston University 2:21 PM Integrative Prescreening in Analysis of Multiple Cancer Genomic Studies � Rui Song1 , Jian Huang2 and Shuangge Ma3 . 1 Colorado State University 2 University of Iowa 3 Yale University 2:44 PM UPS Delivers Optimal Phase Diagram in High Dimensional Variable Selection 1 Pengsheng Ji1 and � Jiashun Jin2 . Cornell University 2 Carnegie Mellon University 3:07 PM Floor Discussion.

Session 60: Current Approaches for Pharmaceutical BenefitRisk Assessment (Invited) Room: Plymouth, 9th floor Organizer: Bennett Levitan, Johnson & Johnson. Chair: Bennett Levitan, Johnson & Johnson.

1:35 PM Frameworks, Guidelines and Methodologies: What is New in Benefit-Risk Assessment? Bennett Levitan. Johnson & Johnson Pharmaceutical RD, L.L.C. 1:58 PM Conjoint Analysis and Utility Models: Case Studies in Drug Benefit-Risk Assessment James Cross. Genentech, Inc. 2:21 PM Benefit-Risk of Multiple Sclerosis Treatments: Lessons Learnt in Multi-Criteria Decision Analysis Richard Nixon1 , Pedro Oliveira2 and � Blair Ireland3 . 1 Novartis Pharmaceuticals Corporation 2 RTI Health Solutions 3 Novartis Pharmaceuticals Corporation 2:44 PM Discussant: John Ferguson, Novartis Corporation 3:07 PM Floor Discussion.

Session 61: Causal Inference and its Applications in Drug Development (Invited) Room: Palace, 9th floor Organizers: Greg Soon, U.S. Food and Drug Administration; Thamban Valappil, U.S. Food and Drug Administration. Chair: Thamban Valappil, U.S. Food and Drug Administration.

1:35 PM Semiparametric Dimension Reduction for Mean Response Estimation with Application to HIV Study � Zonghui Hu, Dean Follmann and Jing Qin. National Institutes of Health 1:58 PM A Causal Effect Model with Stochastic Monotonicity Assumption � Chenguang Wang1 , Mike Daniels2 and Daniel 1 Scharfstein3 . U.S. Food and Drug Administration 2 University of Florida 3 The Johns Hopkins University

29

Tuesday, June 28. 1:30 PM - 3:10 PM 2:21 PM Assessing the Treatment Effects in Randomized Clinical Trials under Principal Stratification Framework Yahong Peng. Pfizer Inc. 2:44 PM Causal Inference and Safety Analysis of Weight Loss Drugs Daniel Rubin. U.S. Food and Drug Administration 3:07 PM Floor Discussion.

Session 62: Recent Advance in Multiplicity Approach (Invited) Room: Manhattan, 5th floor Organizers: David Li, Pfizer Inc.; Christy Chuang-Stein, Pfizer Inc.. Chair: David Li, Pfizer Inc..

1:35 PM Graphical Approaches to Multiple Test Procedures Frank Bretz. Novartis Pharmaceuticals Corporation 1:58 PM Multistage Parallel Gatekeeping with Retesting � George Kordzakhia1 and Alex Dmitrienko2 . 1 U.S. Food and Drug Administration 2 Eli Lilly and Company 2:21 PM Introduction to the Union Closure Method with Application to the Gatekeeping Strategy in Clinical Trials � Han-Joo Kim1 , Richard Entsuah2 and Justine Shults3 . 1 Forest Laboratories, Inc. 2 Merck Research Laboratories 3 University of Pennsylvania School of Medicine

Scientific Program (� Presenting Author) 1:35 PM Multiple-stage Sampling Procedure for Lot Release with Consideration of Both Manufacturer’s and Consumer’s Risks � Bo-guang Zhen, Tiehua Ng and Henry Hsu. U.S. Food and Drug Administration, CBER 1:58 PM An Alternate Gamma-Fitting Statistical Method For AntiDrug Antibody Assays To Establish Assay Cut-Points For Data with Non-Normal Distribution Brian Schlain. Biogen Idec 2:21 PM Issues in Fitting Bioassay Data � Jerry W. Lewis1 and Jason Liao2 . Pharmaceuticals

1

Biogen Idec 2 Teva

2:44 PM Analytical Comparability for Bioprocess Changes and Follow-on Biologics Jason Liao. Teva Branded Pharmaceutical Products R&D, Inc. 3:07 PM Floor Discussion.

Session 65: Experimental Design and Clinical Trials (Invited)

Room: Imperial, 4th floor Organizer: Changxing Ma, State University of New York at Buffalo. Chair: Changxing Ma, State University of New York at Buffalo.

2:44 PM An Adaptive Alpha Spending Approach in Group Sequential Trials David Li. Pfizer Inc. 3:07 PM Floor Discussion.

1:35 PM Efficient and Ethical Adaptive Randomization Designs for Multi-armed Clinical Trials with Weibull Time-to-event Outcomes � Oleksandr Sverdlov1 , Yevgen Ryeznik2 and Weng Kee Wong3 . 1 Bristol-Myers Squibb 2 Kharkov National University of Economics 3 University of California, Los Angeles

Session 63: Bioassay: Methodology for a Rapidly Developing Area (Invited)

1:58 PM The Projective Generalized Aberration Criterion and Its Applications in Factorial Designs Chang-Xing Ma. State University of New York at Buffalo

Room: Melville, 5th floor Organizer: Xuelin Huang, The University of Texas MD Anderson Cancer Center. Chair: Yichuan Zhao, Georgia State University.

1:35 PM Quantification of Real Time Polymerase Chain Reaction Data � Xuelin Huang1 , Wei Wei2 and Jing Ning3 . 1 The University of Texas MD Anderson Cancer Center 2 The University of Texas MD Anderson Cancer Center 3 The University of Texas Health Science Center at Houston 2:05 PM Data Analysis for Limiting Dilution Transplantation Experiments Hao Liu. Baylor College of Medicine 2:35 PM Design for a Small Serial Dilution Series � Daniel Zelterman1 , Alexander Tulupyev2 , Robert Heimer1 and Nadia Abdala1 . 1 Yale University 2 Russian Academy of Sciences 3:05 PM Floor Discussion.

Session 64: Statistical Challenges in Developing Biologics (Invited) Room: Gramercy, 4th floor Organizers: Jason J.Z. Liao, Teva Branded Pharmaceutical Products R&D, Inc. ; Greg Soon, U.S. Food and Drug Administration. Chair: Yabing Mai, Merck & Co., Inc..

30

2:21 PM Phase II Cancer Clinical Trials with Heterogeneous Patient Populations Sin-Ho Jung1 , Myron N. Chang2 and � Sun J. Kang3 . 1 Duke University 2 University of Florida 3 SUNY Downstate Medical Center 2:44 PM A Proof-of-Concept Clinical Trial Design Combined with Dose Ranging Exploration Xin Wang1 and � Naitee Ting2 . 1 Pfizer Inc. 2 Boehringer Ingelheim Pharmaceuticals 3:07 PM Floor Discussion.

Session 66: Advancing Clinical Trial Methods (Invited) Room: Majestic I, 5th floor Organizer: Daniel F. Heitjan, University of Pennsylvania. Chair: Daniel F. Heitjan, University of Pennsylvania.

1:35 PM A “Simonized” Bayesian Design for Phase II Cancer Clinical Trials � Yimei Li1 , Rosemarie Mick2 and Daniel F Heitjan2 . 1 The Children’s Hospital of Philadelphia 2 Philadelphia University 1:58 PM Prediction in Randomized Clinical Trials � Gui-shuang Ying and Daniel F. Heitjan. Pennsylvania

University of

2:21 PM Analysis of Pediatric Obesity Studies: Difference in Conclusions Based on Choice of BMI-Derivate Outcome Renee H. Moore. University of Pennsylvania ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author)

Tuesday, June 28. 3:30 PM - 5:10 PM

2:44 PM Discussant: Cunshan Wang, Pfizer Inc. 3:07 PM Floor Discussion.

Session 67: Panel Session I: Adaptive Designs—When Can and How Do We Get There From Here? (Invited) Room: Majestic II, 5th floor Organizer: Yeh-Fong Chen, U.S. Food and Drug Administration. Chair: Sue-Jane Wang, U.S. Food and Drug Administration.

1:30:00 PM Panel Discussion Sue-Jane Wang1 , Jerry Schindler2 , Vlad Dragalin3 , Lu Cui4 , H.M. James Hung1 , Jack J. Lee5 and Brenda Gaydos6 . 1 U.S. Food and Drug Administration 2 Merck & Co., Inc. 3 ADDPLAN, Aptiv Solutions 4 Eisai Co., Ltd. 5 The University of Texas MD Anderson Cancer Center 6 Eli Lilly and Company 2:55 PM Floor Discussion.

Session 68: Theoretical Developments (Contributed) Room: New Amsterdam, 9th floor Judy Chair: Feng Yang, Columbia University.

Hua Zhong, NYU

1:35 PM The Effect of Preliminary Unit Root Tests on the Prediction Intervals for the Gaussian Autoregressive Processes with Additive Outliers � Wararit Panichkitkosolkul and Sa-aat Niwitpong. King Mongkut’s University of Technology North Bangkok 1:53 PM Robust Confidence Interval for the Means of ZeroLognormal Distribution � Mathew Anthony C. Rosales1 and Magdalena Niewiadomska-Bugaj2 . 1 Comsys 2 Western Michigan University 2:11 PM Proof for the Underestimation of the Greenwood-Type Estimators Jiantian Wang. Kean University

2:11 PM A Bayesian Approach for Early Pharmacokinetics Readout in Phase II Clinical Trials � 1 Bo Jin1 and Yuyan Duan2 . Biotherapeutics Statis2 tics, Pfizer Inc. Global Biometrics Sciences, Bristol-Myers Squibb 2:29 PM Theory and Method for Bayesian Inference of Cox Regression model with Gamma Process Priors in Presence of Ties Zhiyi Chi, � Arijit Sinha and Ming-Hui Chen. University of Connecticut 2:47 PM Bayesian Nonparametric Centered Random Effects Models with Variable Selection Mingan Yang. Saint Louis University 3:05 PM Floor Discussion.

Tuesday, June 28. 3:30 PM - 5:10 PM Session 70: Statistical Analysis on Spatial and Temporal Data (Invited) Room: Pearl, 9th floor Organizer: Minggen Lu, University of Nevada, Reno. Chair: Wei Zhang, Boehringer Ingelheim Pharmaceuticals, Inc..

3:35 PM Additive Hazards Regression and Partial Likelihood Estimation for Ecological Monitoring Data Across Space � 1 Feng-Chang Lin1 and Jun Zhu2 . Department of Biostatistics, University of North Carolina at Chapel Hill 2 Department of Statistics, University of Wisconsin-Madison 3:58 PM Spatial-Temporal Analysis of Non-Hodgkin Lymphoma in a Case-Control Study David Wheeler. National Cancer Institute 4:21 PM On Modeling Ecological Monitoring Data in Space Jun Zhu. University of Wisconsin-Madison

2:29 PM Characterization through Distributional Properties of Generalized Order Statistics A. H. Khan1 and � Imtiyaz A. Shah2 . 1 Department of Statistics and Operations Research, Aligarh Muslim University 2 Aligarh Muslim University

4:44 PM Hierarchical Dynamic Modeling of Outbreaks of Mountain Pine Beetle Yanbing Zheng. University of Kentucky 5:07 PM Floor Discussion.

2:47 PM Pitman Closest Equivariant Estimators and Predictorsunder Location-Scale Models � Haojin Zhou and Tapan K Nayak. George Washington University 3:05 PM Floor Discussion.

Session 71: Recent Developments in Methods for Handling Missing Data (Invited)

Session 69: Bayesian Inferences and Applications (Contributed) Room: Royale, 9th floor Chair: Iryna Lobach, New York University.

1:35 PM Application of Normal Dynamic Linear Model(NDLM) in a Clinical Trial James Ngai. i3 Statprobe 1:53 PM A Bayesian Statistical Analysis Approach for Critical Success Factors Development � Grace Li, Fanni Natanegara, Ming-Dauh Wang and Paul Berg. Eli Lilly and Company ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Room: Plymouth, 9th floor Organizer: Thomas R. Belin, Department of Biostatistics, University of California, Los Angeles. Chair: Thomas R. Belin, Department of Biostatistics, University of California, Los Angeles.

3:35 PM Bayesian Analysis for Mixtures of Incomplete Continuous, Ordinal and Nominal Repeated Measures � Xiao Zhang1 , John Boscardin2 and Tom Belin3 . 1 University of Alabama, Birmingham 2 University of California, San Francisco 3 University of California, Los Angeles 4:05 PM Diagnosing Imputation Models by Applying Target Analyses to Posterior Replicates of Completed Data � Yulei He and Alan M. Zaslavsky. Harvard Medical School

31

Tuesday, June 28. 3:30 PM - 5:10 PM 4:35 PM Bayesian Modeling and Inference for the Data with Informative Switches or Dropouts of Planned Treatments � Ming-Hui Chen1 , Qingxia Chen2 , Joseph G. Ibrahim3 and David Ohlssen4 . 1 University of Connecticut 2 Vanderbilt University 3 University of North Carolina at Chapel Hill 4 Novartis Pharmaceuticals Corporation 5:05 PM Floor Discussion.

Session 72: Methodology for and Applications of Administrative Data (Invited) Room: Minskoff, 9th floor Organizer: Rhonda Rosychuk, University of Alberta, Canada. Chair: Rhonda Rosychuk, University of Alberta, Canada.

3:35 PM Capture-Recapture Techniques to Estimate Chronic Disease Prevalence in the Presence of Misclassification Error in Administrative Health Databases � Lisa Lix and Chel Hee Lee. University of Saskatchewan 3:58 PM Methods for Evaluating ED Interventions Wendy Lou. University of Toronto 4:21 PM Public Health Administrative Data, Bayesian Disease Mapping, and Population Health Information and Policy Ying MacNab. The University of British Columbia 4:44 PM A Case Study on Analysis of Administrative Health Data in the Era of Knowledge Translation � Rhonda J Rosychuk and Amanda S Newton. University of Alberta 5:07 PM Floor Discussion.

Session 73: Fiducial Inference, Generalized Inference, and Applications (Invited) Room: New Amsterdam, 9th floor Organizer: Lili Tian, State University of New York at Buffalo. Chair: Hongkun Wang, Georgetown University.

3:35 PM Inferential Procedures Based on the Generalized Variable Approach with Applications � Krishnamoorthy Kalimuthu. University of Louisiana at Lafayette 4:05 PM Some Applications of Generalized Variable Approach in Diagnostic Studies � Lili Tian1 , Tuochuan Dong1 and Chengjie Xiong2 . 1 State University of New York at Buffalo 2 Washington University at St. Louis 4:35 PM On Generalized Fiducial Inference Jan Hannig. University of North Carolina at Chapel Hill 5:05 PM Floor Discussion.

Scientific Program (� Presenting Author) 3:58 PM A New Semiparametric Estimation Method for Accelerated Hazard Model � Jiajia Zhang1 , Yingwei Peng2 and Ou Zhao1 . 1 University of South Carolina 2 Queen’s University 4:21 PM Analysis for Temporal Gene Expressions under Multiple Biological Conditions Hong-Bin Fang1 and � Dengliang Deng2 . 1 University of Maryland 2 University of Regina 4:44 PM Discussion Hong-Bin Fang. Medicine 5:07 PM Floor Discussion.

University of Maryland School of

Session 75: Statistical Method and Theory for HighDimensional Data (Invited) Room: Manhattan, 5th floor Organizers: Zhigen Zhao, Temple University; Carnegie Mellon University. Chair: Yuexiao Dong, Temple University.

3:35 PM Control of Generalized False Discovery Proportion � 1 Sanat Sarkar1 and Wenge Guo2 . Temple University 2 New Jersey Institute of Technology 3:58 PM Large-Scale Multiple Testing under Dependence � Wenguang Sun1 and Tony Cai2 . 1 North Carolina State University 2 University of Pennsylvania 4:21 PM Sparse Estimation of Conditional Graphical Models with Application � Bing Li. The Pennsylvania State University 4:44 PM On the Generalized of BH Procedure Jiashun Jin1 and � Zhigen Zhao2 . 1 Carnegie Mellon University 2 Temple University 5:07 PM Floor Discussion.

Session 76: Statistical Genomics (Invited) Room: Palace, 9th floor Organizer: Hongyu Zhao, Yale University. Chair: Hongyu Zhao, Yale University.

3:35 PM Concordant Gene Set Enrichment Analysis of Two LargeScale Expression Data Sets Yinglei Lai. George Washington University 3:58 PM Generalized Poisson Model for RNA-seq Data Analysis Sudeep Srivastava and � Liang Chen. Department of Biological Sciences, University of Southern California

Room: Nederlander, 9th floor Organizer: Hong-Bin Fang, University of Maryland School of Medicine. Chair: Hong-Bin Fang, University of Maryland School of Medicine.

4:21 PM Joint Estimation of Multiple Gaussian Graphical Models by Nonconvex Penalty Functions with an Application to Genomic Data � Hyonho Chun1 , Xianghua Zhang2 and Hongyu Zhao2 . 1 Purdue University 2 Yale University

3:35 PM Simultaneous Curve Registration and Clustering for Functional Data � 1 Xueli Liu1 and Mark C.K. Yang2 . City of Hope 2 University of Florida

4:44 PM Genetic Risk Predictions from Genome Wide Association Studies Ning Sun. Yale University 5:07 PM Floor Discussion.

Session 74: Functional Data Analysis (Invited)

32

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author)

Wednesday, June 29. 8:45 AM - 10:25 AM

Session 77: Assessment of Blinding and Placebo Effect (Invited) Room: Royale, 9th floor Organizers: Susan Wang, Boehringer Ingelheim Pharmaceuticals, Inc.; Naitee Ting, Pfizer Inc.. Chair: Fei Wang, Boehringer Ingelheim Pharmaceutical, Inc..

3:35 PM Blinding Assessment in Clinical Trials: A Review of Statistical Methods and a Proposal of Blinding Assessment Protocol �

Heejung Bang1 , Stephen Flaherty2 , Jafar Kolahi3 and Jongbae Park2 . 1 Weill Cornell Medical College 2 University of North Carolina at Chapel Hill 3 Isfahan University of Medical Sciences 3:58 PM Design and Assessment of Blinding in Medical Device Trials �

Alvin Van Orden and Martin Ho. Administration, CDRH

U.S. Food and Drug

4:21 PM Placebo Effect-Adjusted Assessment of Quality of Life in Placebo-Controlled Clinical Trials Jens Eickhoff. University of Wisconsin-Madison 4:44 PM Discussant: Shiling Ruan, U.S. Food and Drug Administration 5:07 PM Floor Discussion.

Session 78: Recent Advances in Survival Analysis and Clinical Trials (Invited) Room: Booth, 3rd floor Organizers: Yichuan Zhao, Georgia State University; Zhezhen Jin, Columbia University. Chair: Zhezhen Jin, Columbia University.

3:35 PM Semiparametric Additive Transformation Model under Current Status Date �

Guang Cheng and Xiao Wang. Purdue University

3:58 PM Evaluating Optimal Treatment Policies Based on Gene Expression Profiles � 2

Ian McKeague1 and Min Qian2 . University of Michigan

1

Columbia University

4:21 PM A Comparison of Multiple Imputation via Chained Equations and General Location Model for Accelerated Failure Time Models with Missing Covariates � 1

Lihong Qi1 , Yulei He2 , Rongqi Chen1 and Xiaowei Yang1 . University of California, Davis 2 Harvard University

4:44 PM Semiparametric Modeling and Inference under Possible Treatment-Time Interaction in Clinical Trials for Time to Event Data Song Yang. National Heart Lung and Blood Institute 5:07 PM Floor Discussion.

Session 79: Biomarker Based Adaptive Design and Analysis for Targeted Agent Development (Invited) Room: Gramercy, 4th floor Organizers: Bo Huang, Pfizer Inc.; Chunming (Mark) Li, Pfizer Inc.. Chair: Bo Huang, Pfizer Inc.. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

3:35 PM An Oncology Case Study in Using a De Novo Approach to Identifying Drug-sensitive Subpopulations in the Add-on Therapy Setting Jared Lunceford and � Yue Wang. Merck Research Laboratories 3:58 PM Biomarker-Based Bayesian Adaptive Designs for Targeted Agent Development � J. Jack Lee, Suyu Liu and Nan Chen. The University of Texas MD Anderson Cancer Center 4:21 PM The Cross-Validated Adaptive Signature Design Boris Freidlin1 , � Wenyu Jiang2 and Richard Simon1 . 1 National Cancer Institute 2 Queen’s University 4:44 PM Biomarker Decision Rules in Early Phase Oncology Proof of Mechanism Trials David Raunig. Pfizer Inc. 5:07 PM Floor Discussion.

Session 80: Analysis of Complex data (Invited)

Room: Belasco, 3rd floor Organizer: Jingjing Wu, University of Calgary, Canada. Chair: Jingjing Wu, University of Calgary, Canada.

3:35 PM Efficient High-Order Gene-by-Gene Interaction Analysis for Genome-wide Association Analysis � Taesung Park1 , Sohee Oh1 , Jaehoon Lee1 , Min-Seok1 and Kyunga Kim2 . 1 Seoul National University 2 Sookmyung Women’s University 4:05 PM Comparative Evaluation of Gene-Set Analysis Methods for the Survival Phenotype � S. Y. Lee1 , J. H. Kim2 and S. H. Lee2 . 1 Sejong University and University of Washington 2 Sejong University 4:35 PM Bayes Multiple Decision Functions � Wensong Wu1 and Edsel Pena2 . 1 Department of Statistics, University of South Carolina 2 Department of Statistics, University of South Carolina 5:05 PM Floor Discussion.

Session 81: Historical Insight on Statisticians Role in the Pharmaceutical Development (Invited) Room: Majestic I, 5th floor Organizer: Xiaolong Luo, Celgene Corporation. Chair: Gang Li, Johnson & Johnson.

3:35 PM The Changing Role of Biostatistics in Drug Development Judith D. Goldberg. New York University School of Medicine 4:05 PM Impact of Statistical Thinking in the Regulatory Review of Drug Development Fanhui Kong. U.S. Food and Drug Administration 4:35 PM Asking Foolish Questions—A Statistician’s Role in Drug Research David Salsburg. Salsburg Statistical Consulting 5:05 PM Floor Discussion.

33

Wednesday, June 29. 8:45 AM - 10:25 AM Session 83: Panel Session II: Industry-Academia Partnership: Successful Stories and Opportunities (Invited) Room: Majestic II, 5th floor Organizers: Kenneth Koury, Merck Research Laboratories; Ivan S.F. Chan, Merck Research Laboratories. Chair: Ivan S.F. Chan, Merck & Co., Inc..

3:30 PM Panel Discussion Kenneth Koury1 , Nan Laird2 , Marcia Levenstein3 and Raymond Bain1 . 1 Merck Research Laboratories 2 Harvard University 3 Pfizer Inc. 4:55 PM Floor Discussion.

Wednesday, June 29. 8:45 AM - 10:25 AM Session 84: Statistical Methods for Disease Genetics and Genomics (Invited) Room: Palace, 9th floor Organizer: Yu Zhang, The Pennsylvania State University. Chair: Yu Zhang, The Pennsylvania State University.

8:50 AM Statistical Association Tests of Rare Variants Wei Pan. University of Minnesota 9:13 AM Genotype Imputation in Whole Genome Shotgun Sequencing Data Using Haplotype Information of Reads � Kui Zhang, Degui Zhi, Nianjun Liu and Jihua Wu. Department of Biostatistics, University of Alabama, Birmingham 9:36 AM Statistical Methods for Testing CNV Association Hongzhe Li. University of Pennsylvania 9:59 AM Graph-Based Bayesian Interaction Mapping in Disease Association Studies Yu Zhang. The Pennsylvania State University 10:22 AM Floor Discussion.

Session 85: Enhancing Clinical Development Efficiency with Adaptive Decision Making (Invited) Room: Pearl, 9th floor Organizers: Weili He, Merck & Co., Inc.; Cong Chen, Merck & Co., Inc.. Chair: Edmund Luo, Merck & Co., Inc..

8:50 AM Promising Zone Designs for Oncology Trials Cyrus Mehta. Cytel Inc. 9:13 AM Optimal GNG Decision Rules for an Adaptive Seamless Phase II/III Oncology Trial � Cong Chen and Linda Sun. Merck & Co., Inc. 9:36 AM Type 1 and Type 2 Error Rates in Early Clinical Investigations Qing Liu and � Pilar Lim. Johnson & Johnson 9:59 AM A Framework for Joint Modeling and Assessment of Efficacy and Safety Data for Probability of Success Evaluation and Optimal Dose Selection � 1 Weili He1 , Xiting Cao2 and Lu Xu3 . Merck & Co., Inc. 2 University of Minnesota 3 GlaxoSmithKline Oncology R&D 10:22 AM Floor Discussion.

34

Scientific Program (� Presenting Author) Session 86: Multivariate and Subgroup Analysis (Invited)

Room: Minskoff, 9th floor Organizer: Kelly H. Zou, Specialty Care Business Unit, Pfizer Inc.. Chair: Kelly H. Zou, Specialty Care Business Unit, Pfizer Inc..

8:50 AM Methods for Dimension Reduction Efstathia Bura. George Washington University 9:13 AM Bootstrap Methods for Assessing the Reliability of Subgroup Discovery in Clinical Trial Data � Javier Cabrera1 , Jiabin Wang1 and Ha Nguyen2 . 1 Rutgers University 2 Pfizer, Inc. 9:36 AM Log-Rank-Type Tests for Equality of Distributions in HighDimensional Spaces � Xiaoru Wu, Zhiliang Ying and Tian Zheng. Columbia University 9:59 AM Statistical Cluster Detection and Pervasive Surveillance of Nuclear Materials Using Mobile Sensors Jerry Cheng1 , � Minge Xie2 , Rong Chen2 and Fred Roberts2 . 1 Columbia University 2 Rutgers University 10:22 AM Floor Discussion.

Session 87: Statistical Issues Arised from Clinical Research (Invited) Room: Manhattan, 5th floor Organizer: Nan Xue, Albert Einstein College of Medicine and Montefiore Medical Center. Chair: Nan Xue, Albert Einstein College of Medicine and Montefiore Medical Center.

8:50 AM Comparing Paired Biomarkers in Predicting Health Outcomes � Xinhua Liu and Zhezhen Jin. Columbia University 9:13 AM Regression with Latent Variables (MIMIC Models): A Better Way to Analyze Composite Scores from Instruments in Clinical Trials and Medical Research � Chengwu Yang1 , Barbara C. Tilley2 , Anbesaw W. Selassie3 and Ruth L. Greene4 . 1 The Pennsylvania State University College of Medicine 2 The University of Texas Health Science Center at Houston 3 Medical University of South Carolina 4 Johnson C. Smith University 9:36 AM Analyzing the Influence of Local Failure on Distant Recurrence in Breast Carcinoma Wei-Ting Hwang. University of Pennsylvania 9:59 AM Genetic Statistical Approach for Disease Risk Prediction Tao Wang. Albert Einstein College of Medicine 10:22 AM Floor Discussion.

Session 88: Recent Developments in Modeling Data with Informative Cluster Size (Invited) Room: Melville, 5th floor Organizer: Zhen Chen, National Institute of Child Health & Human Development/NIH. Chair: Mingan Yang, Saint Louis University.

8:50 AM Semiparametric Regression Analysis of Clustered IntervalCensored Failure Time Data with Informative Cluster Size � Xinyan Zhang1 and Jianguo Sun2 . 1 Harvard School of Public Health 2 University of Missouri ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author) 9:13 AM Analysis of Recurrent Gap Time Data Using the Weighted Risk-Set Method and the Modified Within-Cluster Resampling Method Xianghua Luo1 and � Chiung-Yu Huang2 . 1 University of Minnesota 2 National Institute of Allergy and Infectious Diseases 9:36 AM A Joint Modeling Approach to Data with Informative Cluster Size: Robustness to the Cluster Size Model � Zhen Chen, Bo Zhang and Paul Albert. National Institute of Child Health & Human Development 9:59 AM Discussant: Huiman Barnhart, Duke University 10:22 AM Floor Discussion.

Session 89: New Developments in High Dimensional Variable Selection (Invited) Room: Plymouth, 9th floor Organizers: Jinchi Lv, University of Southern California; Ji Zhu. Chair: Yingying Fan, University of Southern California.

8:50 AM Model-Free Feature Screening for Ultrahigh Dimensional Data � Liping Zhu1 , Lexin Li2 , Runze Li3 and Lixing Zhu4 . 1 Shanghai University of Finance and Economics 2 North Carolina State University 3 The Pennsylvania State University 4 Hong Kong Baptist University 9:20 AM Non-Concave Penalized Composite Likelihood Estimation of Sparse Ising Models � Lingzhou Xue1 , Hui Zou1 and Tianxi Cai2 . 1 University of Minnesota 2 Harvard University 9:50 AM Model Selection Principles in Misspecified Models � Jinchi Lv1 and Jun S. Liu2 . 1 University of Southern California 2 Harvard University 10:20 AM Floor Discussion.

Session 90: Model Selection and Related Topics (Invited)

Room: Gramercy, 4th floor Organizers: Yichao Wu, North Carolina State University; Mengling Liu, New York University. Chair: Xingye Qiao, Binghamton University, State University of New York.

8:50 AM Adaptive Minimax Estimation with Sparse l-q Constraints Yuhong Yang. University of Minnesota 9:13 AM Consistency of Community Detection for Networks under Degree-Corrected Block Models � Yunpeng Zhao, Liza Levina and Ji Zhu. Department of Statistics, University of Michigan 9:36 AM Statistical Analysis of Next-Generation Sequencing Data Wenxuan Zhong. University of Illinois at UrbanaChampaign 9:59 AM Efficient Estimation and Variable Selection in VaryingCoefficient Partially Linear Models � Bo Kai1 , Runze Li2 and Hui Zou3 . 1 College of Charleston 2 The Pennsylvania State University 3 University of Minnesota 10:22 AM Floor Discussion. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Wednesday, June 29. 8:45 AM - 10:25 AM Session 91: Empirical Likelihood and Its Application (Invited) Room: Majestic I, 5th floor Organizer: Wenqing He, The Uinversity of Western Ontario. Chair: Wenqing He, Uinversity of Western Ontario.

8:50 AM Population Empirical Likelihood for Nonparametric Inference in Survey Sampling � Sixia Chen and Jae Kwang Kim. Department of Statistics, Iowa State University 9:20 AM Empirical Likelihood-Based Inferences for a Low Income Proportion Baoying Yang1 , � Gengsheng Qin2 and Jing Qin3 . 1 Sichuan University and Georgia State University 2 Georgia State University 3 National Institute of Allergy and Infectious Diseases 9:50 AM Efficient Empirical Likelihood Inference in Partial Linear Models for Longitudinal Data � Suojin Wang1 and Lianfen Qian2 . 1 Texas A&M University 2 Florida Atlantic University 10:20 AM Floor Discussion.

Session 92: Design and Analysis Issues in DNA methylation (Invited) Room: Nederlander, 9th floor Organizer: Shuang Wang, Columbia University. Chair: Shuang Wang, Columbia University.

8:50 AM Preprocessing Illumina DNA Methylation BeadArrays � Kimberly Siegmund and Tim Triche Jr.. University of Southern California 9:13 AM Differential Inference of DNA Methylation Based on an Ensemble of Mixture Models � Shili Lin and Cenny Taslim. The Ohio State University 9:36 AM DNA Methylation Arrays as a Surrogate Measure of Cell Mixtures � E. Andres Houseman1 , William P. Accomando1 , Devin C. Koestler1 , Brock C. Christensen1 , Carmen J. Marsit1 , Karl 1 T. Kelsey1 and John K. Wiencke2 . Brown University 2 University of California, San Francisco 9:59 AM Method to Detect Differentially Methylated Loci with CaseControl Designs Shuang Wang. Columbia University 10:22 AM Floor Discussion.

Session 93: Recent Advances in Statistical Inference for Functional and Longitudinal Data (Invited) Room: Imperial, 4th floor Organizer: Lijian Yang, Michigan State University. Chair: Lily Wang, The University of Georgia.

8:50 AM Spatial Interpolation for Functional Data Tatiyana Apanasovich. Thomas Jefferson University 9:13 AM Simultaneous Inference for the Mean Function of Dense Functional Data � Guanqun Cao, Lijian Yang and David Todem. Michigan State University

35

Wednesday, June 29. 10:45 AM-12:25 PM 9:36 AM Additive Modeling of Functional Gradients Hans-Georg Mueller1 and � Fang Yao2 . 1 University of California, Davis 2 University of Toronto 9:59 AM A Confidence Corridor for Sparse Longitudinal Data Curves � Shuzhuan Zheng1 , Lijian Yang2 and Wolfgang K. Hardle3 . 1 Michigan State University 2 Michigan State University and Soochow University 3 Humboldt-Universitat zu Berlin and National Central University 10:22 AM Floor Discussion.

Session 94: Emerging Statistical Methods and Theories for Complex and Large Data (Invited) Room: Booth, 3rd floor Organizer: Minge Xie, Rutgers University. Chair: Minge Xie, Rutgers University; Lee Dicker, Rutgers University.

8:50 AM Large Volatility Matrix Inference � Yazhen Wang1 and Jian Zou2 . 1 University of WisconsinMadison 2 National Institute of Statistical Sciences 9:13 AM Biological Pathway Selection through Nonlinear Dimension Reduction Hongjie Zhu and � Lexin Li. North Carolina State University 9:36 AM Smooth Shrinkage Estimators for High-Dimensional Linear Models Lee Dicker. Rutgers University 9:59 AM Bayesian Inference for Finite Population Quantiles from Unequal Probability Samples � Qixuan Chen1 , Michael R. Elliott2 and Roderick J.A. Little2 . 1 Department of Biostatistics, Columbia University 2 Department of Biostatistics, University of Michigan 10:22 AM Floor Discussion.

Session 95: Complex Multivariate Outcomes in Biomedical Science (Invited) Room: Belasco, 3rd floor Organizer: Eva Petkova, New York University. Chair: Eva Petkova, New York University.

8:50 AM Estimation of Piecewise Constant Function from 1D or 2D Correlated Signals in an fMRI Experiment Johan Lim1 and � Sang Han Lee2 . 1 Seoul National University 2 Nathan S. Kline Institute 9:13 AM Lower-Dimensional Approximation for Functional Data with Its Application to Screening Young Children’s Growth Paths � Wenfei Zhang and Ying Wei. Columbia University

Scientific Program (� Presenting Author) Session 96: Application of Machine Learning Approaches in Biomedical Research (Contributed) Room: New Amsterdam, 9th floor Chair: Mengling Liu, New York University.

8:50 AM Estimating Planned Sales Call Frequencies with Incomplete Information Using the EM Algorithm � Lan Ma Nygren and Lewis Coopersmith. Rider University 9:08 AM Multilevel Latent Class Analysis of Stages of Change for Multiple Health Behaviors � Luohua Jiang1 , Jannette Beals2 , Christina Mitchell2 , Spero Manson2 , Kelly Acton2 and Yvette Roubideaux3 . 1 Texas A&M Health Science Center 2 University of Colorado, Denver 3 Indian Health Service 9:26 AM Outcome Weighted Learning for Selecting Individualized Treatment Regimens � Yingqi Zhao, Donglin Zeng and Michael Kosorok. Department of Biostatistics, University of North Carolina at Chapel Hill 9:44 AM Selecting a Target Patient Population Effectively for Desirable Treatment Benefits with the Data from a Randomized Comparative Study � Lihui Zhao1 , Lu Tian2 , Tianxi Cai1 , Brian Claggett1 and Lee-Jen Wei1 . 1 Harvard University 2 Stanford University 10:02 AM Kernel Machine Tests for Rare Genetic Variants in Sequencing Studies Michael C. Wu. University of North Carolina at Chapel Hill 10:20 AM Floor Discussion.

Session 97: Challenges in the Development of Regression Models (Contributed) Room: Royale, 9th floor Chair: Yuanjia Wang, Columbia University.

8:50 AM Lack-of-Fit Testing of a Regression Model with Response Missing at Random Xiaoyu Li. Michigan State University 9:08 AM Statistical Modeling for Study on Health Behaviors Associated with Use of Body Building, Weight Loss, and Performance Enhancing Supplements � Tzu-Cheg Kao1 , Yi-Ting Tsai1 , Daniel Burnett2 , Mark 1 Stephens3 and Patricia A. Deuster4 . Division of Epidemiology and Biostatistics, PMB, Uniformed Services University 2 General Preventive Medicine Residency Program, PMB, Uniformed Services University 3 Department of Family Medicine, Utah State University 4 CHAMP, Department of Military and Emergency Medicine, Utah State University 9:26 AM Scaled Sparse Linear Regression � Tingni Sun and Cun-Hui Zhang. Rutgers University

9:36 AM Penalized Cluster Analysis with Applications to Family Data � Yixin Fang1 and Junhui Wang2 . 1 New York University 2 University of Illinois at Chicago

9:44 AM Generalized Linear Varying-Coefficient Model with Auxiliary Covariates � Jianwei Chen and Qian Xu. San Diego State University

9:59 AM Massively Parallel Nonparametrics � Philip Reiss1 and Lei Huang2 . 1 Nathan Kline Institute, New York University 2 New York University 10:22 AM Floor Discussion.

10:02 AM Nonlinear Varying Coefficient Model and Its Applications � Esra Kurum1 , Runze Li1 , Damla Senturk1 and Yang Wang 2 . 1 The Pennsylvania State University 2 Freddie Mac 10:20 AM Floor Discussion.

36

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author)

Wednesday, June 29. 10:45 AM-12:25 PM Session 98: Recent Development in Measurement Error Models (Invited) Room: Nederlander, 9th floor Organizer: Yanyuan Ma, Texas A&M University. Chair: Yanyuan Ma, Texas A&M University.

10:50 AM Regression-Assisted Deconvolution Julie McIntyre1 and � Leonard A. Stefanski2 . 1 Department of Mathematics and Statistics, University of Alaska, Fairbanks 2 Department of Statistics, North Carolina State University 11:13 AM Nonlinear Models with Measurement Errors Subject to Single-indexed Distortion 1 Jun Zhang1 , Lixing Zhu2 and � Hua Liang3 . East 2 China Normal University Hong Kong Baptist University 3 University of Rochester 11:36 AM Novel Methods for Misclassification Correction � John Staudenmayer1 and Meng-Shiou Shieh2 . 1 University of Massachusetts at Amherst 2 Baystate Medical Center 11:59 AM Quantile Regression with Measurement Errors � Ying Wei1 and Raymond Carroll2 . 1 Department of Biostatistics, Columbia University 2 Department of Statistics, Texas A&M University 12:22 PM Floor Discussion.

Session 99: Recent Developments in Time Series (Invited) Room: Manhattan, 5th floor Organizer: Zhibiao Zhao, The Pennsylvania State University. Chair: Zhibiao Zhao, Pennsylvanie State University.

10:50 AM Banding the Sample Covariance Matrix of Stationary Processes � Mohsen Pourahmadi1 and Wei Biao Wu2 . 1 Texas A&M University 2 University of Chicago 11:13 AM Modeling Dependence in a Network of Brain Signals Cristina Gorrostieta and � Hernando Ombao. Brown University 11:36 AM Gradient Based Cross Validation Method Daniel Henderson1 , � Qi Li2 and Chris Parmeter3 . 1 State University of New York at Binghamton 2 Texas A&M University 3 University of Miami 11:59 AM Inference for Non-Stationary Time Series � Zhibiao Zhao and Xiaoye Li. The Pennsylvania State University 12:22 PM Floor Discussion.

Session 100:

Challenging Topics in Longitudinal Data

(Invited) Room: Plymouth, 9th floor Organizer: Annie Qu, University of Illinois at Urbana-Champaign. Chair: Grace Yi, University of Waterloo.

10:50 AM Some Aspects of Analyzing Longitudinal Data Using Functional Data Analysis Methods Naisyin Wang. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Wednesday, June 29. 10:45 AM-12:25 PM 11:20 AM Marginal Methods for Correlated Binary Data with Misclassified Responses Zhijian Chen, Grace Y. Yi and � Changbao Wu. University of Waterloo 11:50 AM Modelling of Dose-Response-Time Data � Bjorn Bornkamp1 , Chyi-Hung Hsu2 , Jose Pinheiro2 and 1 Frank Bretz1 . Novartis Pharmaceuticals Corporation 2 Johnson & Johnson 12:20 PM Floor Discussion.

Session 101: High-Dimensional Models (Invited)

Room: Melville, 5th floor Organizer: Xiaotong Shen, University of Minnesota. Chair: Wei Pan, University of Minnesota.

10:50 AM Functional LARS for High Dimensional Additive Models Lifeng Wang. Michigan State University 11:13 AM Consistent Model Selection by LASSO Yongli Zhang. University of Oregon 11:36 AM Additive Risk Analysis of Gene Expression Data via Correlation Principal Component Regression � Yichuan Zhao and Guoshen Wang. Georgia State University 11:59 AM A Beta-Mixture Model for Assessing Genetic Population Structure � Dipak K. Dey1 , Rongwei Fu2 and Kent Holsinger1 . 1 University of Connecticut 2 Oregon Health and Science University 12:22 PM Floor Discussion.

Session 102: Phase I/II Clinical Studies: Safety versus Efficacy (Invited) Room: Palace, 9th floor Organizer: Bin Cheng, Columbia University. Chair: Bin Cheng, Columbia University.

10:50 AM Optimizing the Concentration and Bolus of a Drug Delivered by Continuous Infusion � Peter F. Thall1 , Aniko Szabo2 , Hoang Q. Nguyen1 , Catherine M. Amlie-Lefond2 and Osama O. Zaidat2 . 1 The University of Texas MD Anderson Cancer Center 2 Medical College of Wisconsin 11:20 AM A Bayesian Adaptive Design for Multi-dose, Randomized, Placebo-controlled Phase I/II Trials Yuan Ji. The University of Texas MD Anderson Cancer Center 11:50 AM Continual Reassessment Method with Multiple Toxicity Constraints � Shing Lee, Bin Cheng and Ying Kuen Cheung. Columbia University 12:20 PM Floor Discussion.

Session 103: Network Analysis (Invited)

Room: Gramercy, 4th floor Organizers: Tian Zheng, Columbia University; Tyler McCormick, Columbia University. Chair: Tian Zheng, Columbia University.

37

Wednesday, June 29. 10:45 AM-12:25 PM 10:50 AM Multiscale Community Blockmodel for Network Exploration Eric Xing. Carnegie Mellon University 11:13 AM Estimating Latent Processes on a Graph from Indirect Measurements � Edoardo Airoldi and Alexander Blocker. Harvard University 11:36 AM Predicting Behavior with Social Networks � Sharad Goel and Daniel G. Goldstein. Yahoo! Research 11:59 AM Latent Space Models for Networks Using Aggregated Relational Data � Tyler McCormick1 and Tian Zheng2 . 1 Columbia University and University of Washington 2 Columbia University 12:22 PM Floor Discussion.

Session 104: Recent Advances in Genome-Wide Association Studies (Invited) Room: Pearl, 9th floor Organizer: Hongyu Zhao, Yale University. Chair: Hyonho Chun, Yale University.

10:50 AM Prediction with Scores of Tiny Effects: Lessons from Genome-Wide Association Studies � Nilanjan Chatterjee, Mitchell H. Gail and Ju-Hyun Park. National Cancer Institute, DCEG/BB 11:13 AM Testing and Estimation in Genome-Wide Association Studies through Penalized Splines � Yuanjia Wang and Huaihou Chen. Columbia University 11:36 AM A Powerful Association Test of Ordinal Traits in Samples with Related Individuals � Zuoheng Wang and Chengqing Wu. Yale University 11:59 AM Incorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies � Min Chen1 , Judy Cho2 and Hongyu Zhao2 . 1 The University of Texas Southwestern Medical Center at Dallas 2 Yale University 12:22 PM Floor Discussion.

Session 105: Statistical Methodology and Regulatory Issues in Drug Development in Multiple Regions (Invited) Room: Majestic I, 5th floor Organizer: David Zhang, GenenTech, Inc.. Chair: Frank Shen, Abbott Laboratories.

10:50 AM Strategies for Multi-Regional Clinical Developments—A Quantitative Evaluation � William Wang1 , Huimin Liao2 and William Malbecq1 . 1 Merck & Co., Inc. 2 Fudan University 11:20 AM Empirical Shrinkage Estimator for Consistency Assessment of Treatment Effects in Multi-Regional Clinical Trials � Hui Quan1 , Mingyu Li2 , Weichung Joe Shih3 , Soo Peter Ouyang2 , Joshua Chen4 , Ji Zhang1 and Peng-Liang Zhao1 . 1 Sanofi-Aventis U.S. Inc. 2 Celgene Corporation 3 University of Medicine and Dentistry of New Jersey 4 Merck & Co., Inc. 11:50 AM Discussant: Frank Shen, Abbott Laboratories

38

Scientific Program (� Presenting Author) 12:20 PM Floor Discussion.

Session 106: Genomic Biomarker Applications in Clinical Studies (Invited) Room: Imperial, 4th floor Organizer: Zhaoling Meng, Sanofi Avertis. Chair: Zhaoling Meng, Sanofi Avertis U.S. LLC..

10:50 AM Challenges in Use of Genomic Biomarker Qualification When Transitioning between Drug Development Phases— Experience from Cardiovascular, Oncology and Infectious Diseases Peggy Wong. Merck & Co., Inc. 11:13 AM A New Multi-Gene Classification Method for the Prediction of Drug Response � 1 Haisu Ma1 and Zhaoling Meng2 . Yale University 2 Sanofi-Aventis U.S. Inc. 11:36 AM An Integrative Genomics Paradigm for the Discovery of Novel Tumor Subtypes and Associated Cancer Genes � Ronglai Shen1 , Adam B. Olshen2 , Qianxing Mo1 and Si1 jian Wang3 . Memorial Sloan-Kettering Cancer Center 2 University of California, San Francisco 3 University of Wisconsin-Madison 11:59 AM Discussant: Sue-Jane Wang, U.S. Food and Drug Administration 12:22 PM Floor Discussion.

Session 107: Statistical Modeling and Application in Systems Biology (Invited) Room: Booth, 3rd floor Organizer: Wenxuan Zhong, University of Illinois at UrbanaChampaign. Chair: Wenxuan Zhong, University of Illinois at UrbanaChampaign.

10:50 AM Statistical Modeling of RNA-Seq Ping Ma. University of Illinois at Urbana-Champaign 11:13 AM Global Patterns of RNA and DNA Sequence Differences in the Human Transcriptome Mingyao Li. University of Pennsylvania School of Medicine 11:36 AM Bayesian Inference of Interaction in HIV Drug Resistance � Jing Zhang1 , Tingjun Hou2 , Wei Wang3 and Jun Liu4 . 1 Yale University 2 Soochow University 3 University of California, San Diego 4 Harvard University 11:59 AM Phylogenetic Path to Event (PhyloPTE) Samuel Handelman, � Joseph Verducci, Daniel Janies and Jesse Kwiek. The Ohio State University 12:22 PM Floor Discussion.

Session 108: Statistical Methodologies (Invited)

Room: Belasco, 3rd floor Organizer: Gang Li, Johnson & Johnson Pharmaceutical R&D U.S.. Chair: Gang Li, Johnson & Johnson Pharmaceutical R&D U.S..

10:50 AM Some Drop-the-Loser Designs for Monitoring Multiple Doses � Joshua Chen1 , David DeMets2 and Gordon Lan3 . 1 Merck & Co., Inc. 2 University of Wisconsin-Madison 3 Johnson & Johnson ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Scientific Program (� Presenting Author)

Session name

11:13 AM Linear and Nonlinear Boundary Crossing Probabilities for Brownian Motion and Its Application in Predicting Bankruptcy James C. Fu. University of Manitoba

Session 110: Statistical Methods for High-Dimensional Data or Large Scale Studies (Contributed)

11:36 AM Statistical Properties of Parasite Density Estimators in Malaria and Field Applications � Imen Hammami1 , Andr`e Garcia2 and Gr`egory Nuel1 . 1 Applied Mathematics at Paris Descartes 2 Institut de Recherche pour le D`eveloppement

10:50 AM The Effect of Heterogeneity on Statistical Evaluation of Interventions: An Empirical Study � Depeng Jiang1 , Debra J. Pepler2 and Leena K. Augimeri3 . 1 University of Manitoba 2 York University 3 Child Development Institute

11:59 AM One-Step Weighted Composite Quantile Regression Estimation of DTARCH Models Jiancheng Jiang. University of North Carolina at Charlotte 12:22 PM Floor Discussion.

11:08 AM Multilayer Correlation Structure of Microarray Gene Expression Data � Linlin Chen1 , Lev Klebanov2 and Anthony Almudevar3 . 1 Rochester Institute of Technology 2 Charles University 3 University of Rochester

Session 109: Functional and Nonlinear Curve Analysis (Contributed) Room: New Amsterdam, 9th floor Chair: Yixin Fang, New York University.

10:50 AM Predict the Effectiveness of Physical Therapy in Treating Lumber Disc Hernation Based on a LOGISTIC Curve Model � Xueying Li1 , Lin Wang2 , Zhen Huang2 , Xiaoping Kang2 1 and Chen Yao3 . Peking University First Hospital 2 Department of Rehabilitation, Peking University First Hospital 3 Peking University First Hospital 11:08 AM Modeling and Forecasting Functional Time Series � Cong Feng, Lily Wang and Lynne Seymour. Department of Statistics, University of Georgia 11:26 AM Joint Modeling of Longitudinal and Survival Data Using Markov Threshold Regression � Michael Pennell1 , Xin He2 and Mei-Ling Ting Lee2 . 1 The Ohio State University 2 University of Maryland 11:44 AM Semiparametric Bayes Local Additive Models for Longitudinal Data � Zhaowei Hua1 , Hongtu Zhu1 and David B. Dunson2 . 1 University of North Carolina at Chapel Hill 2 Duke University

Room: Royale, 9th floor Chair: Philp Reiss, New York University.

11:26 AM Bayesian Model Averaging as a Natural Choice for Differential Gene Expression Studies � Xi Kathy Zhou1 , Fei Liu2 and Andrew J. Dannenberg1 . 1 Weill Cornell Medical College 2 IBM Thomas J. Watson Research Center 11:44 AM Importance of Statistical Pre-Processing of TMA Biomarker Data for Possible Spatial Bias � Daohai Yu1 , M. J. Schell1 , Z. Zheng1 , B. Rawal1 and G. Bepler2 . 1 H. Lee Moffitt Cancer Center & Research Institute 2 Karmanos Cancer Institute, Wayne State University 12:02 PM A Likelihood-Based Framework for Association Analysis of Allele-Specic Copy Numbers � Yijuan Hu, Wei Sun and Danyu Lin. Department of Biostatistics, University of North Carolina at Chapel Hill 12:20 PM Floor Discussion.

Session 111: Special presentation: CDISC Standard Development, Implementation Strategy, and Case Study (Invited) Room: Minskoff, 9th floor Organizer: Jian Chen, EDETEK, Inc.. Chair: Steve Kopko, CDISC.

12:02 PM Issues and Adjustment for Bioassay Nonlinear Dose Re-10:45:00 AM sponse Parameter Estimate � Rong Liu1 , Jane Liao1 and Jason Liao2 . 1 Merck Research Laboratories 2 Teva Pharmaceuticals 12:20 PM Floor Discussion. 12:10 PM

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Panel presentation Frank Newby, � Steve Kopko, � Sunny Xie and � Jian Chen. Department of Epidemiology and Population Health, Albert Einstein College of Medicine of Yeshiva University Floor Discussion. �

39

BRIGHTECH INTERNATIONAL LLC A Statistical and Clinical Data Solution

285 Davidson Ave, Suite 504, Somerset, NJ 08873, USA Phone: (908) 790-8888, Fax: (908) 340-1888 Email:[email protected], Web: www.brightech-intl.com Brightech is a CRO consisting of 30 employees with 15+ years average experience in the pharmaceutical and health-related industries. We provide high quality services with competitive price in data management, statistics, SAS programming, medical writing and clinical operations in China for pharmaceutical and biotech companies. Our core services include

x x x x x

Data management (systems: OC/RDC; Clinical Information Management System (CIMS))  Statistics (study design, protocol/SAP writing, regulatory strategic consultation and FDA Brightech International  interaction) SAS programming (TFL generation, SDTM mapping, ADaM creation) ‘|j‚€sj‹€˜s„ Medical writing (CSR, IB, ISS/ISE, manuscript) Tom Xie, Ph.D. Clinical operations in China (CTA application, conduct trials) PRESIDENT & CEO

Brightech is dedicated to providing the highest level of satisfaction to our clients. Our promises to our clients are quality, efficiency and completeness. 45 Hillcrest Ave Tel: Berkeley Heights, NJ 07922 Email: [email protected]

Contact Us: [email protected] 1300 Virginia Drive, Suite 103 Fort Washington, PA 19034 215-283-6035

K&L Consulting Services, Inc. is a Contract Research Organization (CRO) founded in 1995 and headquartered in Fort Washington, PA.

K&L supports various agency submissions for clinical trials from Phase I through IV and in therapeutic areas including Oncology, Neurology, Cardiovascular, Allergy, GI, Ophthalmology, Anti-infective, PK/PD, Metabolic disease, Dermatology, Anti-inflammatory/pain, and Vaccines.

K&L specializes in providing services including Biostatistics, SAS programming, Data management/EDC, CDISC/SDTM In 2010, K&L was involved with 5 NDA development, e-Submission and Medical FDA filings, 4 ISS/ISE studies, more than writing. 50 ongoing or completed studies and 80 protocols for SDTM generation. K&L has a special SDTM group handling CDISC standardization, data Our experienced staff, strong management, migration/conversion, define documents, high standards of quality makes us a unique and e-Submission data packages. service provider and partner in the Pharmaceutical industry. .

Abstracts

Abstracts

Session 1: Statistical Challenges Arising from Design and Analysis of NIH Studies Challenges Arising from National Heart, Lung, and Blood Institute Clinical Trials Nancy L. Geller National Heart Lung and Blood Institute [email protected] The National Heart, Lung, and Blood Institute (NHLBI) undertakes large clinical trials in heart, lung, and blood diseases and sleep disorders which pose many challenges in design, implementation and analysis. Two trials will be used as examples of problems that have recently arisen and the solutions chosen. The COAG (Clarification of Optimal Anti-coagulation through Genetics) trial will be used to illustrate whether to study all eligible subjects or a specific subgroup; whether to use a time to event endpoint or a surrogate; and how to achieve double blinding when different subjects get different doses of a drug. A trial in hematopoietic stem cell transplantation for multiple myeloma will illustrate a study in which randomization for the primary comparison was not feasible and probe the consequences of a parallel non-randomized design. The studies sponsored by NHLBI present stimulating statistical situations in which multiple potential solutions should be assessed. Some Problems Arising in the Development and Evaluation of Risk Models Mitchell H. Gail National Cancer Institute, DCEG [email protected] Models of absolute risk of disease incidence are useful for counseling individuals, assessing disease burden, and implementing public health strategies. Likewise, risk models for disease prognosis following diagnosis are often useful in devising management strategies. Despite the considerable literature on the development and evaluation of risk models, new challenges arise as one attempts to incorporate information from high-dimensional sources, such as genome-wide association studies. There has also been a resurgence of interest in methods to evaluate and compare such models, and in particular, to determine how the models perform in specific public health applications, rather than with respect to more general criteria, such as the area under the receiver operator characteristic (ROC) curve. Some of these issues are illustrated with breast cancer risk models. Using Group Testing to Evaluate Gene-Environment Interaction � Aiyi Liu1 , Chunling Liu2 , Paul Albert1 and Zhiwei Zhang1 1 National Institute of Child Health & Human Development 2 The Hong Kong Polytechnic University [email protected] Aimed at more efficient screening of a rare disease, Dorfman (1943) proposed to test for syphilis antigen by first testing pooled blood samples, followed by retesting of individuals in groups found to be infected. This strategy and its variations developed later, often referred to as group testing or pooled testing, has received substantial attention for efficient identification of an event or estimation of the probability that the event occurs.

40

In this paper we further investigate the optimality properties of group testing strategy in estimating the prevalence of a disease. We show that, when the disease status is measured with error, group testing with moderate group sizes provides more efficient estimation than the fully observed data over a wide range of disease prevalence. When the number of groups is fixed, group testing also prevails over the one-subject-per-group random sampling design for moderate disease prevalence. We discuss applications to evaluation of gene-environment interactions, and proposed a strata-based group testing strategy for such an evaluation. A Hybrid Parametric and Empirical Likelihood Ratio Statistic for Testing Interaction between Covariates in Case-Control Studies. � Jing Qin1 , Hong Zhang2 , Maria Teresa Landi2 , Neil E. Caporas2 and Kai Yu2 1 National Institute of Allergy and Infectious Diseases 2 National Cancer Institute [email protected] The case-control study design provides an effective way of collecting covariate information conditioning on subjects’ outcome status (normal or affected). The standard logistic regression model can be used to assess the interaction between two risk factors X and Y under such a design. Although the samples are collected retrospectively, it is well known that fitting a prospective logistic regression model provides the most efficient estimates for the coefficients of both the main effects and interactions, under a semi-parametric model without any specification on the covariate distribution. However, when certain constraints derived from auxiliary information are imposed on the covariate distribution, the standard prospective logistic regression method might no longer be the most efficient one. We develop a unified approach for the statistical inference of interaction between X and Y under the case-control design. We characterize the relationship between the two risk factors in the control population by specifying a parametric model for Y—X, the conditional distribution of Y given X, while leaving the distribution of the other components of the joint covariate distribution to be fully non-parametric. Then a maximum hybrid parametric and empirical likelihood method is proposed to make inference for the underlying parameters. The estimate and the associated test derived from the proposed semi-parametric model are suitable for evaluating interaction between two risk factors of various types (discrete or continuous). We derive asymptotic results for semi-parametric likelihood ratio statistics. We provide both simulation results and two real data examples to demonstrate the advantages of the proposed method over existing ones. Moreover we point out the proposed method can be used for the secondary outcome analysis.

Session 2: Statistical Challenges from Survey, Behavioral and Social Data A Corpus from a Single Survey � Andrew Gelman, Michael Malecki, Vincent Dorie and Wei Wang Columbia University [email protected] We discuss some recent projects involving multilevel modeling of ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts survey data and related challenges in assessing model improvements. Overfitting can be an issue even with cross validation. We discuss some open questions in model validation, using as a tool a new approach in which a single survey yields dozens of parallel cross-validation studies. On the Stationary Distribution of an Incompatible Gibbs Sampler � Jingchen Liu1 , Andrew Gelman1 , Jennifer Hill2 and Yu-Sung Su3 1 Columbia University 2 New York University 3 Tsinghua University [email protected] In this talk, we discuss the properties of an incompatible Gibbs sampler widely used for multiple imputations. The iterative (or chained) imputation, in which variables are imputed one at a time each given a model predicting from all the others, is a popular technique that can be convenient and flexible, as it replaces a potentially difficult multivariate modeling problem with relatively simple univariate regressions. The imputation distribution is then defined as the stationary (invariant) distribution of the corresponding Markov chain (an incompatible Gibbs sampler). We begin to characterize the convergence and stationary distributions of iterative imputations and their statistical properties. The central analysis lies in creating a coupling of two Markov processes. The results and analysis techniques can be applicable to studies that involve modeling through conditional distributions in general. Statistical Methods in Search Engine Ranking Daryl Pregibon1 , � Rachel Schutt1 , Ni Wang1 and Yong Li2 1 Google Research 2 Google [email protected] Modern search engines use a variety of signals that are combined to produce a single ranked list of results. We focus on using individual users’ preferences from serving logs and illustrate how statistical methods can be used to produce a click-based signal that can be incorporated into the aggregate ranking function. The ideas include conditional likelihood, generalized estimating equations, BradleyTerry models, empirical Bayes, and sparse “Lasso“ methods. Computational issues are also discussed and we adapt some existing algorithms that allow us to fit millions of regularized paired preference models. We conclude with a short discussion on evaluation by human raters that shows improved performance (over an existing method) and at the same time, applicability to more queries. Demographic Diversity on the Web M. Irmak Sirer1 , � Jake M. Hofman2 and Sharad Goel2 1 Northwestern University 2 Yahoo! Research [email protected] To what extent do the online experiences of, for example, men and women, or Whites and African-Americans differ? We address such questions by pairing web browsing histories for 265,000 anonymized individuals with user-level demographic data—including age, sex, race, education, and income. In one of the most comprehensive analyses of Internet usage patterns to date, we make three broad observations. First, while the majority of popular sites have diverse audiences, there are nonetheless, prominent sites with highly homogeneous user bases. For example, Fox News attracts millions of visitors each month, yet has an audience that is more than 90% White. Second, although most users spend a significant fraction of their time on email, search, and social networking ICSA Applied Statistics Symposium 2011, NYC, June 26-29

sites, there are still large group-level differences in how that time is distributed. For example, women spend a third more of their time on Facebook than do men, and aside from such universally popular destinations, the top sites frequented by different groups are relatively distinct. Finally, the between-group statistical differences enable reliable inference of an individual’s demographic attributes from browsing activity. We thus conclude that while the Internet as seen by different demographic groups is in some regards quite similar, sufficient differences persist so as to facilitate group identification.

Session 3: Enhancing Probability of Success Using Modeling & Simulation Enhancing the Design of a Dose Ranging Study Using Modeling and Simulation � Yaming Hang and Devan Mehrotra Merck & Co., Inc. yaming [email protected] Modeling and simulation (M&S), a tool identified as an opportunity to improve drug development efficiency in FDA’s critical path initiative, has seen wider acceptance and broader adoption in the pharmaceutical industry in recent years. Pharmacometric models are now routinely developed to characterize, understand and predict a drug’s pharmacokinetic (PK) and pharmacodynamic (PD) behavior, to quantify uncertainty of information about the behavior, and coupled with simulation, to rationalize data-driven decision making in drug development. A case study will be presented to illustrate how M&S can be a powerful tool to enhance the selection of dose levels when designing a dose ranging study for a new compound. M&S was utilized by leveraging relevant knowledge on another compound in the same drug class. Population PK and PK/PD models were built using both preclinical and clinical data to estimate the dose-exposure-response relationship for the new compound. Clinical trial simulations were then performed to predict trial outcome and assess probability of success for different combinations of dose levels, and an informed decision was made based on M&S work. Enhancing Probability of Selecting the Best Compound Vlad Dragalin ADDPLAN, An Aptiv Solutions company [email protected] The sponsor may have up to three compounds simultaneously approaching the POC stage in the same therapeutic area: subjects with mild to moderate Alzheimer’s disease. A conventional development strategy is to investigate these three compounds in a sequential manner, one after another in separate trials. The conventional design of each such study is a multicenter, randomized, doubleblind, placebo-controlled trial with two active arms (low, high) and placebo in a 1:1:1 randomization, all as adjunctive to background therapy. This conventional development strategy is compared and contrasted with an adaptive compound finder proof of concept study design that investigates several compounds in a single trial. The objective is to find with high probability the “best“ compound using adaptive allocation of subjects to competing treatments. The primary endpoint for comparing the efficacy of the compounds is the change from baseline at 12 months in ADAS-Cog. A maximum sample size of 450 subjects is utilized to adaptively allocate to 6 active treatments (low and high doses for each compound), all as adjunctive to background therapy, and a placebo (standard of care). An early stopping

41

Abstracts for efficacy or futility is utilized. The comparison of these two design strategies is done through intensive simulations. Response data is simulated under a dozen of possible scenarios and the two strategies are compared on different operating characteristics: the average number of subjects, the average study duration, probability of correctly identifying the “best” compound. An “Exposure”-Response Modeling Approach to Support PoC Decision Making - A Case Study Chyi-Hung Hsu Johnson & Johnson Pharmaceutical RD, L.L.C. [email protected] The ultimate objective of the drug development is to bring more and better medicines to market in the shortest possible time. A key decision point in the drug development is after conducting the proof of concept (PoC) studies. These studies allow a preclinical hypothesis of mechanism of action to be tested, and their results would be used to support the project team’s decision about further developing a drug candidate. This presentation outlines one such proof of concept trial where a new candidate was investigated to assess dose response relationship and to evaluate the magnitude of its effect compared to placebo. A K-PD type of exposure response model, implemented under the Bayesian framework, was used to summarize the existing longitudinal information. This information was then becoming the basis for designing various new PoC decision criteria. Simulations were used to examine the corresponding operating characteristics of these criteria, and to compare with those of the conventional multiple comparison approach.

Session 4: Financial Statistics, Risk Management, and Rare Event Modeling Extreme Temperatures and CME Temperature Derivatives Debbie Dupuis HEC Montr`eal [email protected] The CME defines the average daily temperature underlying its contracts as the average of the maximum and minimum daily temperatures, yet all published work on temperature forecasting for pricing purposes ignored this peculiar definition of the average and sought to model the average temperature directly. Here, we cast the average temperature forecasting problem as one in the analysis of extreme values. The theory of extreme values guides model selection for temperature maxima and minima, and a forecast distribution for the CME’s daily average temperature is found through convolution. While univariate time series AR-GARCH models and regression models generally yield superior point forecasts of temperature, our extreme-value-based model consistently outperforms these models in density forecasting, the most important risk management tool. The Connection Between the Logit Model, Normal Discriminant Analysis Model, and the Multivariate Normal Mixtures Weihu Cheng Beijing University of Technology [email protected] The relationship between the logit model, normal discriminant analysis (NDA) model, and the mixtures of multivariate normal distributions (MMND) model is discussed. It is shown that the likelihood equations for MMND model can be obtained from the likelihood

42

equations for the NDA model by simply replacing the index variables with weights that are the logistic probabilities, and that all three models use the same linear discriminant function to classify observations into different populations. Some implications of these relationships for data analysis are also discussed. HYBRID-GARCH: A Generic Class of Models for Volatility Predictions Using Mixed Frequency Data Xilong Chen1 , Eric Ghysels2 and � Fangfang Wang3 1 SAS Institute Inc. 2 University of North Carolina at Chapel Hill 3 University of Illinois at Chicago [email protected] We propose a general GARCH framework that allows the predict volatility using returns sampled at a higher frequency than the prediction horizon. We call the class of models High FrequencY DataBased PRojectIon-Driven GARCH, or HYBRID-GARCH models, as the volatility dynamics are driven by what we call HYBRID processes. The HYBRID processes can involve data sampled at any frequency. On the Estimation of Integrated Covariance Matrices of High Dimensional Diffusion Processes � Xinghua Zheng and Yingying Li The Hong Kong University of Science and Technology [email protected] We consider the estimation of integrated covariance (ICV) matrices of high dimensional diffusion processes based on high frequency observations. We start by studying the most commonly used estimator, the realized covariance (RCV) matrix. We show that in the high dimensional case when the dimension p and the observation frequency n grow in the same rate, the limiting spectral distribution (LSD) of the RCV matrix depends on the covolatility process not only through the targeting ICV matrix, but also on how the covolatility process varies in time. We establish a Marcenko-Pastur type theorem for weighted sample covariance matrices, based on which we further establish a Marcenko-Pastur type theorem for RCV matrices for a class C of diffusion processes. The results explicitly demonstrate how the time-variability of the covolatility process affects the LSD of RCV matrix. We then propose an alternative estimator, the time-variation adjusted realized covariance (TVARCV) matrix. We show that for diffusion processes in class C, the TVARCV matrix possesses the desirable property that its LSD depends solely on that of the targeting ICV matrix through a Marcenko-Pastur equation.

Session 5: Innovative Drug Safety Graphics Presentation Wiki Resources for Improving Your Statistical Graphs Richard Forshee U.S. Food and Drug Administration, CBER/OBE [email protected] With today’s modern software, it’s never been easier to make bad statistical graphs. The General Principles sub-team of the Industry/FDA/Academia Safety Graphics Working Group has created several resources to help you design high-quality statistical graphs that help readers develop a rich understanding of the data and analysis you are presenting. The resources include 1) a set of flowcharts designed to help statisticians choose the most appropriate type of graph depending on the type of data and analysis; 2) a set of wiki pages describing classes of graphs, for example histograms and scatterplots; 3) a list of “do’s and don’ts” for statistical graphs; and 4) lots of examples of effective statistical graphs and the software ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts code used to produce them. Using improved statistical graphs will enhance communication among all stakeholders and advance public health. These resources will be explored during the presentation. Let Graphs Speak for You: Examples of Using Adverse Event Graphics for Safety Signal Detection Liping Huang Roche Products Limited [email protected] Currently most statistical summaries of adverse events from clinical trials are displayed in lengthy tables or listings. It is challenging for both reviewers and decision makers to use these presentations to identify and/or understand potential safety signals in order to adequately characterize the safety profile of pharmaceutical products in the context of their intended indication, and ultimately to make decisions regarding benefit-risk. The purpose of this presentation is to document the clinical questions most frequently encountered in reviewing adverse events, and to recommend a set of graphics that were developed by representatives of the Industry/FDA/Academia Safety Graphics Working Group to address these questions and communicate important safety information more effectively and efficiently. Organized and Effective Interpretation of Clinical Laboratory Data: Graphs Make a Difference Robert Gordon Johnson & Johnson [email protected] As part of the Industry/FDA/Academia Safety Graphics Working, the Labs & Liver Subgroup has been focused on collecting, organizing and developing innovative graphics to interpret the large, detailed clinical laboratory data collected in clinical trials. Efficient and effective interpretation of the vast amount of laboratory data is gained through the use of these innovative graphical displays. The displays presented will emphasize the advantages of using data visualization approaches over traditional use of tabular summaries. In addition, displays of liver toxicity will be shared due to recent emphasis on presenting liver function data via graphical approaches. Graphical Presentations for ECG and Vitals Data � R.J. Anziano1 , R. Fiorentino2 , E. Frimpong2 , A. Paredes2 , P. Bridge3 and L. Huang3 1 Pfizer Inc. 2 U.S. Food and Drug Administration, CDER 3 Roche Products Limited [email protected] Graphical approaches will be shown that elicit key information for ECG and vital sign data collected in clinical trials. The graphical approaches are based upon those developed within a smaller working group that is part of the larger Industry/FDA/Acacemia Safety Graphics Working Group. Emphasis will be based upon graphical approaches for assessing the QT interval and QT corrections. There will be a discussion of how most of the graphs can be extended to other intervals and vitals measures. An invitation to CTSPedia to view the graphics along with the sample datasets used to create them will be shown.

Session 6: Topics in Statistical Machine Learning and High Dimensional Data Analysis Sparsity Inducing Credible Sets for High-Dimensional Variable Selection Howard Bondell ICSA Applied Statistics Symposium 2011, NYC, June 26-29

North Carolina State University [email protected] For high-dimensional data, particularly when the number of predictors greatly exceeds the sample size, selection of relevant predictors for regression is a challenging problem. Methods such as sure screening, forward selection, or penalized regressions such as LASSO or SCAD are commonly used. Bayesian variable selection methods place prior distributions on the parameters along with a prior over model space, or equivalently, a mixture prior on the parameters having mass at zero. Since exhaustive enumeration is not feasible, posterior model probabilities are often obtained via long MCMC runs. The chosen model can depend heavily on various choices for priors and also posterior thresholds. Alternatively, we propose a conjugate prior only on the full model parameters and using sparse solutions within posterior credible regions to perform selection. These posterior credible regions typically have closed form representations, and it is shown that these sparse solutions can be computed via existing algorithms. The approach is shown to outperform common methods in the high-dimensional setting, particularly under correlation. By searching for a sparse solution within a joint credible region, consistent model selection is established. Furthermore, it is shown that the simple use of marginal credible intervals can give consistent selection up to the case where the dimension grows exponentially in the sample size. High-Dimensional Non-Linear Interaction Structures Peter Radchenko University of Southern California [email protected] Numerous penalization based methods have been proposed for fitting a traditional linear regression model in which the number of predictors, p, is large relative to the number of observations, n. Most of these approaches assume sparsity in the underlying coefficients and perform some form of variable selection. Recently, some of this work has been extended to non-linear additive regression models. However, in many contexts one wishes to allow for the possibility of interactions among the predictors. This poses serious statistical and computational difficulties when p is large, as the number of candidate interaction terms is of order p squared. We introduce a new approach that is based on a penalized least squares criterion and is designed for high dimensional non-linear problems. Our criterion is convex and enforces the heredity constraint, in other words if an interaction term is added to the model, then the corresponding main effects are automatically included in the model. Detailed simulation results demonstrate that the new method is computationally efficient and can be applied to non-linear models involving thousands of terms, while producing superior predictive performance over other approaches. We also provide theoretical conditions under which our method will select the correct main effects and interactions. These conditions suggest that the new method should outperform certain natural competitors when the true interaction structure is sufficiently sparse. Adaptively Weighted Large Margin Classifiers � Yichao Wu1 and Yufeng Liu2 1 North Carolina State University 2 University of North Carolina at Chapel Hill [email protected] Large margin classifiers have been shown to be very useful in many applications. The Support Vector Machine is a canonical example of large margin classifiers. Despite their flexibility and ability in handling high dimensional data, many large margin classifiers have serious drawbacks when the data are noisy, especially when there

43

Abstracts are outliers in the data. In this paper, we propose a new weighted large margin classification technique. The weights are chosen adaptively with data. The proposed classifiers are shown to be robust to outliers and thus are able to produce more accurate classification results Parameter Estimation for Ordinary Differential Equations: An Alternative View on Penalty � Yun Li, Naisyin Wang and Ji Zhu Department of Statistics, University of Michigan [email protected] Dynamic modeling through solving ordinary differential equations has ample applications in the fields of physics, engineering, economics and biological sciences. The recently proposed parametercascades estimation procedure with a penalized estimation component (Ramsay et al., 2007) combines the strengths of basis-function approximation, profile-based estimation and computation feasibility. Consequently, it has become a very popular estimation procedure. In this manuscript, we take an alternative view through variance evaluation on the penalized estimation component within the parameter-cascades procedure. We found, through some theoretical evaluation and numerical experimentations, that the penalty term in the profile component could increase estimation variation. Further, contrary to the traditional belief established from the penalized spline literature, this penalty term in the ordinary differential equations setup also makes the procedure more sensitive to the number of basis functions. By taking the penalty parameter to its limit, we propose an alternative estimation procedure. Our numerical experiences indicate that through the more time- and computationconsuming task of penalty parameter selection, the popular penaltybased method performs similarly to the proposed method. For other casually selected penalty parameters and numbers of basis functions, the proposed method outperforms the penalty-based method. We observe this phenomenon in a numerical study even when the underlying ordinary differential equations model is mis-specified. We provide theoretical properties for the proposed estimator. We illustrate its use and its comparisons to the penalty-based procedure in the analysis of simulated data and a lynx and hare predator-prey dynamic data set.

Session 7: High-Dimensional Feature Selection, Classification and Dynamic Modeling for Genetics Applications A Mathematical Framework for Functional Mapping of Complex Systems Using Delay Differential Equations Guifang Fu, Zhong Wang, Jiahan Li and � Rongling Wu The Pennsylvania State University [email protected] All biological phenomena occurring at different levels of organization from cells to organisms can be modeled as a dynamic system, in which the underlying components interact dynamically to comprehend its biological function. Such a systems modeling approach facilitates the use of biochemically and biophysically detailed mathematical models to describe and quantify “living cells,“ leading to an in-depth and precise understanding of the behavior, development and function of a biological system. Here, we illustrate how this approach can be used to map genes or quantitative trait loci (QTLs) that control a complex trait using the example of the circadian rhythm system which has been at the forefront of analytical mathematical modeling for many years. We integrate a system of biologically meaningful delay differential equations (DDEs) into

44

functional mapping, a statistical model designed to map dynamic QTLs involved in biological processes. The DDEs model the ability of circadian rhythm to generate autonomously sustained oscillations with a period close to 24h, in terms of time-varying mRNA and protein abundances. By incorporating the Runge-Kutta forth order algorithm within the likelihood-based context of functional mapping, we estimated the genetic parameters that define the periodic pattern of QTL effects on time-varying mRNA and protein abundances and their dynamic association as well as the linkage disequilibrium of the QTL and a marker. We prove theorems about how to choose appropriate parameters to guarantee periodic oscillations. We further used simulation studies to investigate how a QTL influences the period and the amplitude of circadian oscillations through changing model parameters. The model provides a quantitative framework for assessing the interplay between genetic effects of QTLs and rhythmic responses. High-Dimensional Classification Using Influential Multi-Factor Interactions Identified by SPV Algorithm � Jing-Shiang Hwang and Tsuey-Hwa Hu Academia Sinica, Taiwan [email protected] The most critical issue concerning performance of classification methods for high-dimensional data is the set of selected features which may consist of influential single factors and multi-factor interactions. Stepwise paring-down variation (SPV) algorithm is a new method for the identification of influential factors related to a continuous response variable. The algorithm is simple and feasible in computation as it only involves repeated runs of fitting analysis of variance models of exact one term. Specifically, it starts with a run of exhaust search for the most important single factor by fitting ANOVA models for each factor. If the factor with the largest effect is identified as influential, the estimated effects of this factor are subtracted from the responses to produce refined responses. The procedures of exhaust search for single factors are repeated for the next runs using the newly refined responses to reveal important ones. When no more important singles revealed, the same procedures used for screening single factors are applied to search for important two-factor interactions, and can be further continued to screen for higher multi-factor interactions. The main idea of SPV is to stepwise subtract the effects of identified factors in each run from the total variation of responses so that the remaining influential single factors and multi-factor interactions have increased chances of being identified in the next runs. When the responses are binary, a probability value can be generated from a beta posterior distribution for each response. So we can apply the SPV algorithm directly to data with the logit transformation of the generated probability values as responses. With each identified feature, we fit a logistic regression model to build a classifier. Finally, we combine these classifiers to construct a classification rule using a boosting algorithm. We demonstrate its competitive performance through simulation and real data analysis. A Classification Method Incorporating Interaction among Variables for High-Dimensional Data � Inchi Hu1 , Haitian Wang1 , Shaw-Hwa Lo2 and Tian Zheng2 1 The Hong Kong University of Science and Technology 2 Columbia University [email protected] In this talk, we will focus on a specific problem - the use of gene expression data to predict clinical outcomes of cancer, even though the method can be applied much more broadly. The problem is chalICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts lenging not just because the number of variables p is much greater than the number of observations n. What’s even more challenging is that one needs to consider the interactive effects among variables in addition to their individual marginal effects. We present a classification method incorporating interactions among variables using an influence measure introduced by Lo and Zheng (2002, 2004) as a basic tool. The classification rule is a boosting ensemble of logisticregression classifiers. Each classifier involves a cluster of variables, where interaction among variables in the cluster is explicitly incorporated. The proposed classification method is intended to have two desirable properties. First, the classification rule derived from the method has low error rates. Secondly, in the process of constructing the classification rule, influential variables responsible for the response are identified. That is, not only the classification result is accurate but also the classification rule contains important information in understanding the phenomenon under study. We applied the proposed classification method to three well-known gene expression miroarray datasets and obtain impressive results. High Dimensional ODEs Coupled with Mixed-Effects Modeling Techniques for Dynamic Gene Regulatory Network Identification Tao Lu1 , Hua Liang1 , Hongzhe Li2 and � Hulin Wu1 1

University of Rochester

2

University of Pennsylvania

[email protected] Gene regulation is a complicated process. The interaction of many genes and their products forms an intricate biological network. Identification of this dynamic network will help us understand the biological process in a systematic way. However, the construction of such a dynamic network is very challenging for a high-dimensional system. In this article we propose to use a set of ordinary differential equations (ODE), coupled with dimensional reduction by clustering and mixed-effects modeling techniques, to model the dynamic gene regulatory network (GRN). The ODE models allow us to quantify both positive and negative gene regulations as well as feedback effects of one set of genes in a functional module on the dynamic expression changes of the genes in another functional module, which results in a directed graph network. A five-step procedure, Clustering, Smoothing, regulation Identification, parameter Estimates refining and Function enrichment analysis (CSIEF) is developed to identify the ODE-based dynamic GRN. In the proposed CSIEF procedure, a series of cutting-edge statistical methods and techniques are employed, that include nonparametric mixedeffects models with a mixture distribution for clustering, nonparametric mixed-effects smoothing-based methods for ODE models, the smoothly clipped absolute deviation (SCAD)-based variable selection, and stochastic approximation EM (SAEM) approach for mixed-effects ODE model parameter estimation. The key step, the SCAD-based variable selection of the proposed procedure is justified by investigating its asymptotic properties and validated by Monte Carlo simulations. We apply the proposed method to identify the dynamic GRN for yeast cell cycle progression data. We are able to annotate the identified modules through function enrichment analyses. Some interesting biological findings are discussed. The proposed procedure is a promising tool for constructing a general dynamic GRN and more complicated dynamic networks. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Session 8: Recent Developments in Design and Analysis for High-Dimensional Data Estimating False Discovery Proportion under Arbitrary Covariance Dependence Jianqing Fan, � Xu Han and Weijie Gu Princeton University [email protected] Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any genes are associated with some traits and those tests are correlated. When test statistics are correlated, false discovery control becomes very challenging under arbitrary dependence. In the current paper, we propose a new methodology based on principal factor approximation, which successfully subtracts the common dependence and weakens significantly the correlation structure, to deal with an arbitrary dependence structure. We derive the theoretical distribution for false discovery proportion (FDP) in large scale multiple testing when a common threshold is used and provide a consistent FDP. This result has important applications in controlling FDR and FDP. Our estimate of FDP compares favorably with Efron (2007)’s approach, as demonstrated by in the simulated examples. Our approach is further illustrated by some real data applications. A Statistical Framework for Illumina DNA Methylation Arrays � Pei Fen Kuan1 , Sijian Wang2 , Xin Zhou1 and Haitao Chu3 1 University of North Carolina at Chapel Hill 2 University of Wisconsin-Madison 3 University of Minnesota-Twin Cities [email protected] The Illumina BeadArray is a popular platform for profiling DNA methylation, an important epigenetic event associated with gene silencing and chromosomal instability. Limited statistical tools are available for analysis of such platform and current approaches rely on an arbitrary detection p-value cutoff for excluding probes and samples from subsequent analysis as a quality control step, which result in missing observations and information loss. It is desirable to have an approach that incorporates the whole data, but accounts for the different quality of individual observations. We first investigate and propose a statistical framework for removing the various source of biases. We then introduce a weighted model based clustering called LumiWCluster for Illumina BeadArray that weights each observation according to the detection p-values systematically and avoids discarding subsets of the data. LumiWCluster allows for discovery of distinct methylation patternsand automatic selection of informative CpG loci. We will also introduce a testing framework for detecting differential methylation specific to Illumina Methylation platform. We demonstrate the advantages of our proposed approach on several Illumina Methylation data sets. Designs for the Lasso Xinwei Deng1 , � C. Devon Lin2 and Peter Z.G. Qian1 1 University of Wisconsin-Madison 2 Queen’s University [email protected] We propose an approach using nearly orthogonal Latin hypercube designs, originally motivated by computer experiments, to significantly enhance the accuracy of the Lasso procedure. Systematic methods for constructing such designs are presented. The effectiveness of the proposed method is illustrated with several examples.

45

Abstracts Graphical Model with Ordinal Variables � Jian Guo, Liza Levina, George Michailidis and Ji Zhu University of Michigan [email protected] Existing graphical models mainly consider the variables either in numerical scale or in nominal scale. In this paper, we propose a new graphical model characterizing another important type of variables—ordinal variables, which consist of a limited number of levels with a natural order. Examples of ordinal variables include the user rating records for online movies or music. In the proposed model, namely probit graphical model, we assume these ordinal variables are discretized from the corresponding latent numeric variables, which jointly follows a multivariate Gaussian distribution and whose partial correlations can be used to characterize the dependence relationship between the original ordinal variables. Under this modeling framework, we developed an EM-like algorithm to recover the underlying Gaussian graphical structures. The proposed model exhibits its superior performance over the Gaussian graphical model on a few synthetic ordinal data sets. It was also applied to exploring the graphical structures between a number of movies based on their ratings by users and some interesting patterns were discovered.

Session 9: Pharmaceutical Safety Graphic Display for Summarizing Individual Responses in Crossover Designed Human Abuse Potential Studies Ling Chen U.S. Food and Drug Administration [email protected] The human abuse potential study plays a critical role in understanding whether a drug produces positive subjective responses indicative of abuse potential. This type of study has crossover designed investigation with multiple treatments and multiple abuse potential measures. Sponsors often provide mean time course profiles for each abuse potential measure by treatment, but this does not provide information about time to peak or peak duration for individual subjects. This presentation will propose a graphic method to display individual responses in a crossover study. This graphic method will provide an easy tool for the Controlled Substance Staff (CSS) at FDA and Sponsors to visually evaluate whether individual responses to each treatment are different from each other, and also provide a tool to investigate the time to peak response and the duration of the peak response as well as extreme responses on a subject base. Some Statistical Issues in a Safety Clinical Trial - Thorough QT/QTc Study Joanne Zhang U.S. Food and Drug Administration [email protected] The necessity of conducting the so-called thorough QT/QTc study (TQTS) for most new pharmaceutical compounds as well as for new indications and new administration routes of existing compounds is now well recognized. The purpose of ICH E14 guidance is to provide recommendations to sponsors conducting such clinical studies to exclude the potential delayed cardiac repolarization of study drugs. Nevertheless, there is still a spectrum of encountered problems with details of design, conduct, analysis, and interpretation of such studies. In this presentation, the focus will be on establishing assay sensitivity for a TQTS, validating correction methods after adjusting QT intervals by its heart rates, and efficiently calculating

46

the sample size based on both statistical tests for the study drug and the positive control. Meta-Analysis for Durg Safety Evaluation with Rare Outcomes � Xiao Ding and Mat Soukup U.S. Food and Drug Administration [email protected] Meta-analysis has become increasingly popular in medical research and pharmaceutical evaluation, as a statistical tool to combine information from several independently performed studies. A further concern of meta-analysis arises when the outcomes are rare events. Most of the current meta-analysis methods are based on large sample approximations, and may be unsuitable when the outcomes are rare. In practice, simply excluding trials with zero events from the analysis, or arbitrarily adding continuity correction such as 0.5 to those trials is the standard procedure. These procedures may lead to invalid conclusions. In this presentation, the performance of several classical meta-analysis methods as well as the newly proposed method will be discussed under the situation of rare outcomes.

Session 10: Lifetime Data Analysis Two Criteria for Evaluating Risk Prediction Models � Ruth Pfeiffer and Mitchell H. Gail National Cancer Institute [email protected] We propose and study two criteria to assess the usefulness of models that predict risk of disease incidence for screening and prevention, or the usefulness of prognostic models for management following disease diagnosis. The first criterion, the proportion of cases followed PCF(q), is the proportion of individuals who will develop disease who are included in the proportion q of individuals in the population at highest risk. The second criterion is the proportion needed to follow-up, PNF(p), the proportion of the general population at highest risk that one needs to follow in order that a proportion p of those destined to become cases will be followed. PCF(q) assesses the effectiveness of a program that follows 100q% of the population at highest risk. PNF(p) assesses the feasibility of covering 100p% of cases by indicating how much of the population at highest risk must be followed. We show the relationship of PCF and PNF to the Lorenz curve and its inverse, and present distribution theory for their estimates. We develop methods for inference for a single risk model, and also for comparing the PCFs and PNFs of two risk models, both of which were evaluated in the same validation data. A Semiparametric Threshold Model for Censored Longitudinal Data Analysis � Jialiang Li and Wenyang Zhang National University of Singapore [email protected] Stimulated by the investigation of the relationship between blood pressure change and progression of microalbuminuria (MA) among individuals with type I diabetes, in this paper, we propose a new semiparametric threshold model for censored longitudinal data analysis and a semiparametric BIC type criterion to identify the parametric component in the proposed model. We treat the cluster effects in the proposed model as unknown fixed effects. An estimation procedure is proposed to estimate the unknown functions and constants in the proposed model. Asymptotic properies are established to justify the proposed estimation. A quadratic approximation is proposed to implement the proposed estimation procedure. Based on such approximation, the proposed estimation procedure ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts becomes very easy to implement, and the implementation does not need any computation of any multiple integral or any iterative algorithm. Simulation studies show the proposed estimation, model selection and their implementation algorithm work very well. Finally, the proposed model and estimation procedure are used to analyse the Wisconsin Diabetes data, which leads to some interesting findings. Analysis of Survival Data with Missing Censoring Indicators Gregg Dinse The National Institute of Environmental Health Sciences [email protected] In an analysis of censored survival data, the censoring indicator may be missing or unknown for some subjects. This talk describes three nonparametric, kernel-based methods for analyzing right-censored survival data when some censoring indicators are missing at random. The three approaches use regression calibration, imputation, and inverse probability weighting techniques. Asymptotic normality and other large-sample results are discussed. Finite-sample performance is evaluated via simulation. The proposed methods are motivated and illustrated with right-censored event-time data for a quality-of-life endpoint from a cancer clinical trial. Connecting Threshold Regression and Accelerated Failure Time Models � Xin He1 , G. A. Whitmore2 and Mei-Ling Ting Lee1 1 University of Maryland 2 McGill University [email protected] The accelerated failure time model is one of the most commonly used alternative methods to the Cox proportional hazards model when the proportional hazards assumption is violated. Threshold regression is a relatively new alternative model for analyzing timeto-event data with non-proportional hazards. It is based on firsthitting-time models, where the time-to-event data can be modeled as the time at which the stochastic process of interest first hits a boundary or threshold state. This paper compares threshold regression and accelerated failure time models and demonstrates the situations when the accelerated failure time model becomes a special case of the threshold regression model. Three illustrative examples from clinical studies are provided.

Session 11: High Dimensional Statistical Learning A ROAD to Classification in High Dimensional Space � Jianqing Fan1 , Yang Feng2 and Xin Tong1 1 Princeton University 2 Columbia University [email protected] For high-dimensional classification, it is well known that naively performing the Fisher discriminant rule leads to poor results due to diverging spectra (Bickel and Levina, 2004) and noise accumulation (Fan and Fan, 2008). Therefore, researchers proposed independence rules to circumvent the diverse spectra, and sparse independence rule to mitigate the issue of noise accumulation. However, in biological applications, there are often a group of correlated genes responsible for clinical outcomes, and the use of the covariance information can significantly reduce misclassification rates. The extent of such error rate reductions is unveiled by comparing the misclassification rates of the Fisher discriminant rule and the independence rule. To materialize the gain based on finite samples, a Regularized ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Optimal Affine Discriminant (ROAD) is proposed based on a regularized Fisher discriminant. ROAD selects an increasing number of features as the penalization relaxes. Further benefits can be achieved when a screening method is employed to select a subset of variables. A constrained coordinate-wise descent (CCD) algorithm is also developed to solve the optimization problem related to ROAD. Oracle type of sampling properties are established. Simulation studies and real data analysis support our theoretical results and demonstrate the advantages of the new classification procedure under a variety of correlation structures. A delicate result on piecewise linear solution path is demonstrated for the ROAD optimization problem, which used to justify our CCD algorithm. SOFARE: Selection of Fixed and Random Effects in HighDimensional Longitudinal Data Analysis Yun Li1 , � Peter X. Song1 , Sijiang Wang2 and Ji Zhu1 1 University of Michigan 2 University of Wisconsin-Madison [email protected] SOFARE is a new algorithm of variable selection in regularized mixed-effects regression models for high-dimensional longitudinal data. The proposed regularization takes place simultaneously at both fixed effects and random effects, in which estimation and selection in the mean and covariance structures are carried out via penalized likelihood and penalized REML, respectively. An application of SOFARE is to detect any predictors that are nonlinearly associated with outcomes through semiparametric additive mixed-effects models. SOFARE enables us to automatically determine which predictors are unassociated, linearly associated or nonlinearly associated with outcomes. We demonstrate SOFARE using both simulation studies and real world data sets. The Screening and Ranking Algorithm to Detect DNA Copy Number Variations � Ning Hao1 , Yue Niu1 and Heping Zhang2 1 University of Arizona 2 Yale School of Public Health [email protected] A Copy number variation (CNV) is a segment of DNA where different copy numbers have been observed by comparing two or more genomes. It has recently gained considerable interest as a source of genetic variation that likely influences phenotypic differences. Usually, there are up to a million bio markers along the whole genome. It is a challenging task to find CNVs accurately and efficiently. The Screening and Ranking algorithm (SaRa) proposed by Niu and Zhang (2010) is a fast multiple change-point detection tool which can be applied to detect DNA copy number variations. In this talk, we report some extensions of the SaRa and recent developments in multiple change-point detection. In particular, an improved version of the SaRa is proposed to enhance the power to detect short segments. Moreover, we discuss applications for CNV detection for both single and multiple sequences. Loss Adaptive Modified Penalty in Variable Selection Tengfei Li1 , � Yang Feng2 , Wen Yu1 , Zhiliang Ying2 and Hong Zhang3 1 Fudan University 2 Columbia University 3 University of Science and Technology of China [email protected] For variable selection, balancing sparsity and stability is a very important task. In this work, we propose the Loss Adaptive Modified Penalty (LAMP) where the penalty function is adaptively changed

47

Abstracts with the type of the loss function. For generalized linear models, we provide a unified form of the penalty corresponding to the specific exponential family. We show that LAMP can have asymptotic stability while achieving oracle properties. In addition, LAMP could be seen as a special functional of a conjugate prior. An efficient coordinate-descent algorithm is proposed and a balancing method is introduced. Simulation results show LAMP has competitive performance comparing with several well-known penalties.

Session 12: Multiplicity Issues and Predictive Modeling of Enrollment in Clinical Trials Multiplicity Problems Arising in Subgroup Analysis � Alex Dmitrienko and Brian Millen Eli Lilly and Company [email protected] This talk focuses on clinical trials pursuing tailored therapy objectives, eg, include evaluation of treatment effects in focused subpopulations (defined by demographics, clinical and genetic markers) in addition to standard analyses in the overall population. Inferences in the subpopulations are independent of inferences in the overall population and thus may result in regulatory claims even if there is no evidence of a beneficial effect in the overall population. We provide a summary of statistical methods used in tailored therapy trials, including methods for Type I error rate control and analysis considerations to support labeling. Mixture Gatekeeping Procedures with Clinical Trial Applications � Ajit Tamhane1 and Alex Dmitrienko2 1 Northwestern University 2 Eli Lilly and Company [email protected] Gatekeeping procedures address the problems of testing hierarchically ordered and logically related null hypotheses that arise in clinical trials involving multiple endpoints, multiple doses, noninferiority-superiority tests, subgroup analyses etc. In this talk we will review previous work and then introduce a very general and powerful method for constructing gatekeeping procedures based on the closure principle, called the mixture method. This method is capable of handling arbitrary logical restrictions among the hypotheses. A clinical trial example will be presented to illustrate the method. Empirical Bayes on Accrual Prediction for Multicenter trials H. Nguyen,1 , W. Strawderman2 and � C. Yu1 1 Pfizer Inc. 2 Rutgers University [email protected] Conducting clinical trials needs good statistical tools for the initial planning and for the ongoing monitoring of clinical trials. In this paper, we used empirical Bayesian framework to determine prior information. Using a Bayesian framework we combine prior information with the information known up to a monitoring point to obtain a prediction.

Session 13: Statistical Issues in Late-Stage HIV Clinical Trials Challenges in Designing and Analyzing Confirmatory HIV Trials Guoxing Soon

48

U.S. Food and Drug Administration [email protected] With recent approval of several new drugs of new classes, like enfuviritide, raltegravir, and maraviroc, comes change and challenge in HIV trial design especially in treatment experienced population. Notably, many placebo (combined with background drugs) trials, used to be the standard, are no longer considered to be ethical in light of these new options. At the same time, non-inferiority comparison in the actively controlled studies is difficult because the current background drugs may differ substantially from the old background drugs which was used to approve the active comparator. In this talk I will cover thoughts behind recent changes in the primary endpoint and the changes in trial durations. I will discuss several types of trials with greater complexity. I will also share the agencies’ efforts in addressing some of these issues through metaanalysis of existing database. Design and Monitoring Benefits and Risks in HIV Clinical Trials Using Prediction Scott Evans Harvard University [email protected] Combination antiretroviral therapy has been identified as one of the top ten medical advances of the past decade. However the drugs comprising these powerful regimens can have toxic side effects. The evaluation of both the benefits and the risks of these drugs is a fundamental goal in the monitoring and analyses of HIV clinical trials. For example, Data Monitoring Committees that monitor HIV trials evaluate benefits and risks to make recommendations regarding further conduct of the trial. Although many methods for interim monitoring exist, few of these methods evaluate the joint magnitude of the benefits and risks. Furthermore none use prediction to convey information regarding potential effect size estimates and associated precision, with trial continuation. We propose use of prediction and “predicted rings“ as a flexible and practical tool for monitoring HIV trials. These methods will provide a valuable tool for Data Monitoring Committees and other decision-makers when evaluating interim data. Design and Endpoints for HIV Trials in Antiretroviral Treatment Naive Patients Mike Wulfsohn, Jim Rooney and � Lijie Zhong Gilead Sciences, Inc. [email protected] A well designed clinical trial will allow many questions to be answered. Amongst these questions are “How effective is a regimen?”ˇt and “How effective is a strategy?” Patients need both successful regimens and successful strategies, and answers to both these questions are clinically important. Regimen effectiveness focuses on the patient’s initial regimen, whereas strategy focuses on response to all regimens used. In industry we generally focus on the first question since to license a new treatment, it is necessary to show that the new product is effective, safe and well tolerated. In addressing the effectiveness of a regimen, patients who prematurely discontinue treatment or switch to alternative treatments are analyzed as non-responders. Regarding strategy, the question is what improvement would be observed in clinical practice if patients started treatment with one regimen rather than the other. In real life, patients may switch to alternative therapies, and patients care about virologic response regardless of whether they stayed on their baseline regimen. From the patient’s perspective, switches are not necessarily a sign of failure, especially ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts if this can be accomplished without any resistance mutations. Pure virologic analyses do however not address drug safety. The evaluation of regimen effectiveness is generally easier to interpret than strategies, since we are looking at data from baseline regimens as opposed to data from a more complex sequencing of treatments where the rollover therapies may be heterogeneous. In this presentation, examples from a Gilead sponsored study, Study 903, will be used to illustrate what endpoints/analysis methods should be used to address the two questions. Missing Data: It Is Better to Prepare and Prevent than to Repair and Repent Sara Hughes ViiV Healthcare [email protected] Considerable statistical research has been done in recent years to develop sophisticated statistical methods for handling missing data and dropouts in the analysis of clinical trial data. However, there has been less focus on proactive intervention prior to study initiation, and during study conduct, aimed at reducing the level of missing data for analysis although some examples exist. If statisticians and other clinical trial personnel proactively set out at the study initiation stage to assess the typical impact of missing data for the disease under study, and investigate ways in which to reduce dropouts, there is considerable potential to improve the clarity and quality of study results and also increase statistical power and efficiency. Clinical trial site staff are the individuals who directly interact with study subjects, and as such it is immediately clear they have a key role to play in the retention of those subjects. However, all the individuals involved in the design and execution of a clinical trial have a role to play in increasing retention and reducing missing data. This paper presents a case study from HIV in which the Statistical group led an initiative in order to investigate the impact of missing data and to explore ways to reduce missing data in future clinical trials, with the ultimate goal of improving quality and clarity of future trial results and increasing efficiency. The particular focus of this talk will be on the role the study statistician can play.

Session 14: New Developments in Methods for the Analysis of Repeated Measures Data Analyzing Repeated Measures Semi-Continuous Data, with Application to an Alcohol Dependence Study � Lei Liu1 , Robert L. Strawderman2 , Bankole Johnson1 and John O’Quigley1 1 University of Virginia 2 Cornell University [email protected] Two-part random effects models (Olsen and Schafer 2001, Tooze, Grunwald, and Jones 2002) have been applied to repeated measures of semi-continuous data, which involve a substantial proportion of zero values and are often skewed to the right and heteroscedastic in the positive values. In this paper, We make three extensions of the original random effects two-part model by assuming the positive values follow (a) a generalized Gamma distribution, (b) a log skew normal distribution, and (c) a normal distribution after BoxCox transformation. We account for heteroscedasticity in all models. The maximum likelihood estimates are obtained through adaptive Gaussian quadrature, which can be conveniently implemented in SAS Proc NLMIXED. The performance of the methods is compared, through applications to daily drinking records from a ranICSA Applied Statistics Symposium 2011, NYC, June 26-29

domized controlled trial of topiramate for alcohol-dependence treatment. We find that all three models are better than the log-normal model in Part II. The generalized Gamma distribution provides the best fit to data, and there exists strong evidence for heteroscedasticity. Estimation for Single-Index Mixed Effects Models � Liugen Xue1 and Zhen Pang0 1 Beijing University of Technology [email protected] In this paper, we consider a single-index mixed effects model with longitudinal data. The introduction of the random effects raises interesting inferential challenge. Instead of treating the variance components as nuisance parameters, we propose root-n consistent estimators for them. A new set of estimating equations is proposed to estimate the single-index coefficients. The link function is estimated by using the local linear smoothing. Asymptotic normality is established for the proposed estimators. Also, the estimator of the link function achieves optimal convergence rates. These results facilitate the construction of confidence regions/intervals and hypothesis testing for the parameters of interest. Some simulations and an application to real data are included to illustrate our methods. Improving the Convergence Rate in Mixed-Effects Models � Guangxiang Zhang and John J. Chen State University of New York at Stony Brook [email protected] Mixed-effects model has been widely used in hierarchical and longitudinal data analyses. In practice, the fitting algorithm can fail to converge due to boundary issues of the estimated random-effects covariance matrix G. Current available algorithms are not computationally optimal because the condition number of G is unnecessarily increased when the random-effects correlation estimate is not zero. The traditional mean centering technique may even increase the random-effects correlation. To improve the convergence of data with such boundary issue, we propose an adaptive fitting (AF) algorithm using an optimal linear transformation of the random-effects design matrix. The AF algorithm can be easily implemented with standard software and be applied to other mixed-effects models. Simulations show that AF significantly improves the convergence rate, and reduces the condition number and non-positive definite rate of the estimated G, especially under small sample size, relative large noise and high correlation settings. One real life data for Insulin-like Growth Factor (IGF) protein is used to illustrate the application of this algorithm implemented with software package R (nlme). Estimating Multiple Treatment Effects Using Two-Phase Regression Estimators � Cindy Yu1 , Jason Legg2 and Bin Liu1 1 Iowa State University 2 Amgen Inc. [email protected] We propose a semiparametric two-phase regression estimator with a semiparametric generalized propensity score estimator for estimating average treatment effects in the presence of informative firstphase sampling. The proposed estimator is shown to be easily extendable to any number of treatments and does not rely on a prespecified form of the response or outcome functions. The estimator is shown to asymptotically outperform standard estimators such as the ˇ estidouble expansion estimator and eliminate bias found in nadve mators that ignore the first-phase sample design such as matching and inverse propensity weighted estimators. Potential performance

49

Abstracts gains are demonstrated through a simulation study. Mixed-Effects Models for Evaluating Cardiac Function and Treatment Effects � Maiying Kong, Hyejeong Jang and Daniel J. Conklin University of Louisville [email protected] The mixed-effects model is an efficient tool for analyzing longitudinal data. The random effects in mixed models can be used to capture the correlations between repeated measurements within a subject. The time points are not fixed and all available data can be used in the mixed-effects model provided that data are missing at random. For this reason, we focused on applying mixed-effects models to the repeated measurements of cardiac function including heart rate, left ventricle developed blood pressure, and coronary blood flow in the gluatathione S-transferase (GSTP) gene knockout and wild-type mice following ischemia/reperfusion injury performed in the isolated, Langendorff perfused heart . Cardiac function is measured during three time periods: pre-ischemia, ischemia, and reperfusion periods. We developed piecewise nonlinear function to describe different aspects of cardiac function during each period. We applied nonlinear mixed effects models and changing point model to examine how cardiac function was altered by ischemia/reperfusioninduced injury and for comparison between mouse strains. These findings provide evidence of a new application for the mixed effects model in physiological and pharmacological studies of the heart.

Session 15: Statistical Methods in Biomarker Discovery Efficient Two-Stage Smoothing Estimation Methods in Semivarying Ordinary Differential Equation Models with Application to Influenza Dynamics � Hongqi Xue1 , Arun Kumar2 and Hulin Wu1 1 University of Rochester 2 Trinity University Hongqi [email protected] We propose a new class of two-stage smoothing estimation methods for semivarying coefficient ordinary differential equation (ODE) models. The new method exploits the form of numerical discretization algorithms for an ODE solver to formulate estimating equations, and at the second stage uses more data points than the observation. The asymptotic properties for the proposed estimators are established based the Z-theorem we extend. By simulation studies, we show that the proposed method performs better than the original two-stage smoothing estimation method, especially for small sample cases. An influenza real data example is applied. Kernel Estimation for Three Way ROC Surface � Chenqging Wu1 , Liansheng Tang2 and Pang Du3 1 Yale School of Public Health 2 George Mason University 3 Virginia Polytechnic Institute and State University [email protected] In this manuscript,we studied the asymptotic properties for kernel estimators of three dimensional ROC surfaces. The estimators were used to compare two or more ROC surfaces. The proposed method was evaluated by simulation studies and applied to real data. Semeparametirc Time-Dependent ROC Models for Evaluating the Prognosis Accuracy of Biomarkers � Nan Hu1 and Xiao-Hua Zhou2 1 University of Utah 2 University of Washington

50

[email protected] ROC curves are commonly used for visualizing sensitivity and specificity of a continuous biomarker or diagnostic test result, Y , for a binary disease outcome D. In practice, however, many disease outcomes depend on time and it is appropriate to derive the corresponding time-dependent ROC curves. In this paper, we motivate the time-dependent ROC curve using examples and review the previous statistical methods. Then, two proposed semiparametric methods are outlined. The first method is proposed to use semiparametric regression approach to estimate the covariate-adjusted time-dependent ROC curves by modeling timedependent sensitivities and false positive rates (FPRs), based on a transformation model for the event time, T, and a semi-parametric location model for the biomarker, Y . We call this approach the indirect time-dependent ROC regression method. The second semiparametric regression approach is a directly method for the time-dependent ROC curve. We call this approach the direct time-dependent ROC regression method. In this work, We give our outlines for the approaches. In this paper, we also proposed a new time-dependent prognostic accuracy, namely, the interval time-dependent true positive rate (TPR). Simultaneously Comparing Accuracy among Clustered Diagnostic Markers, with Applications to the BioCycle Study Liansheng Tang George Mason University [email protected] The accuracy of diagnostic markers plays an important role in many areas. This article presents two simultaneous procedures for testing accuracy among clustered diagnostic biomarkers. The first procedure is a test of homogeneity among biomarkers which has been scantly discussed. The test is based on a global hypothesis of the same accuracy. Given reasonable distributional assumptions, the results under the alternatives provide the power analysis and sample size calculation. The second procedure is a simultaneous pairwise comparison test based on generalized ROC statistics. This test is particularly useful if global difference among biomarkers is found by the homogeneity test. We apply our procedures to a clinical trial at NICHD designed to assess and compare the accuracy of hormone and oxidative stress biomarkers in distinguishing women with ovulatory menstrual cycles from those without.

Session 16: Advancing Statistical Methodolgy for Handling Missing Data in Longitudinal Studies Kernel Regression and Differential Equations Willard John Braun University of Western Ontario [email protected] This talk is concerned with an aspect of the problem in which subject-specific effects are modelled nonparametrically with a smooth function which satisfies or approximately satisfies a differential equation. Use of higher-order local polynomial regression is often problematic in cases of data sparsity (which might be a result of data missing at random, for example). By exploiting the differential equation appropriately, a procedure more in line with local constant regression can be employed, partially circumventing the missing data problem. The technique is illustrated with a problem involving growth curves. A Weighted Simulation-Based Estimator for Incomplete Longitudinal Data � Liqun Wang and He Li ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts University of Manitoba [email protected] We propose a weighted simulation-based estimator for generalized linear mixed models with monotone missing-at-random response data. This method relies on the first two marginal moments of the responses and allows the random effects to have flexible parametric distributions which is necessarily normal. We use the so-called simulation-by-parts technique to achieve consistency and asymptotic normality for the proposed estimator without requiring the simulation size increasing to infinity. We also use the inverse probability weighting to incorporate monotone missing-at-random response data. Testing and Interval Estimation for Two-Sample Survival Comparisons with Small Sample Sizes and Unequal Censoring � Rui Wang, Stephen Lagakos and Robert Gray Harvard School of Public Health [email protected] While the commonly used log-rank test for survival times between 2 groups enjoys many desirable properties, sometimes the log-rank test and its related linear rank tests perform poorly when sample sizes are small. Similar concerns apply to interval estimates for treatment differences in this setting, though their properties are less well known. Standard permutation tests are one option, but these are not in general valid when the underlying censoring distributions in the comparison groups are unequal. We develop 2 methods for testing and interval estimation, for use with small samples and possibly unequal censoring, based on first imputing survival and censoring times and then applying permutation methods. One provides a heuristic justification for the approach proposed recently by Heinze and others (2003, Exact log-rank tests for unequal follow-up. Biometrics 59, 1151-1157). Simulation studies show that the proposed methods have good Type I error and power properties. For accelerated failure time models, compared to the asymptotic methods of Jin and others (2003, Rank-based inference for the accelerated failure time model. Biometrika 90, 341-353), the proposed methods yield confidence intervals with better coverage probabilities in small-sample settings and similar efficiency when sample sizes are large. The proposed methods are illustrated with data from a cancer study and an AIDS clinical trial. A Comparison of Power Analysis Methods for Evaluating Effects of a Predictor on Slopes in Longitudinal Designs with Missing Data � Cuiling Wang, Charles B. Hall and Mimi Kim Department of Epidemiology and Population Health, Albert Einstein College of Medicine of Yeshiva University [email protected] In many longitudinal studies, evaluating the effect of a binary or continuous predictor variable on the rate of change of the outcome, i.e., slope, is often of primary interest. Sample size determination of these studies, however, is complicated by the expectation that missing data will occur due to missed visits, early drop out and staggered entry. Despite the availability of methods for assessing power in longitudinal studies with missing data, the impact on power of the magnitude and distribution of missing data in the study population remain poorly understood. As a result, simple but erroneous alterations of the sample size formulae for complete/balanced data are commonly applied. These “naive” approaches include such as the average sum of squares (ASQ) and average number of subjects (ANS) methods. The goal of this paper is to explore in greater detail the effect of missing data on study power and compare the performance of naive sample size methods to a correct maximum likeliICSA Applied Statistics Symposium 2011, NYC, June 26-29

hood based method using both mathematical and simulation based approaches. Two different longitudinal aging studies are used to illustrate the methods.

Session 17: Developments and Applications of Models with Time-Varying Covariates or Coefficients Some Applications of Time-Varying Covariates by U.S. Food and Drug Administration Reviewers John Lawrence U.S. Food and Drug Administration [email protected] I will show a few examples of how Cox regression models with time-varying covariates have been used in my experience at the FDA. One example involves use of a concomitant medication whose use and dose changes over time among subjects. Another involves measurements that are not clinical outcomes, but may be affected by the treatment and may be predictive of the clinical outcome measurement. Time-Varying Covariate Adjustment in Time-to-Event Data Analysis Julia Wang Johnson & Johnson [email protected] In order to get a more accurate treatment comparison, time-varying covariate analysis is increasingly being conducted by sponsors and/or requested by regulatory agencies. This presentation will demonstrate a recent application of time-varying covariate adjustment in a Cox regression analysis. Graphical Presentation for the Cox Model with Time-Varying Covariates � Urania Dafni1 and Dimitris Karlis2 1 University of Athens 2 Athens University of Economics and Business [email protected] Estimating survival curves in a non-parametric way via the use of Kaplan-Meyer probabilities is common practice. In the clinical trial setting, in the case of randomized treatments, this approach is widely applied and its properties are well understood. In cases where the treatment varies across time (e.g. switching to another therapy) attempts have been made based on the classical KaplanMeyer to estimate the survival curve for the group of patients that changed treatment. For these cases the time varying Cox model can estimate efficiently and without bias the treatment effect, but no information about the survival curves is given. The aim of this work is to fill the gap by considering an appropriate estimate for the survival curve and thus to visualize the survival curve when treatment varies across time. Real data examples and alternative approaches to graphically present the time-varying covariate model will be discussed. An alternative estimate for the survival curve based on similar conditional arguments as in the typical Kaplan-Meyer but taking into account the survival and treatment history of the patients, is developed. Properties of the proposed estimator are discussed. Simulation evidence is also provided. The proposed survival estimates allow the extraction of information on the treatment effect for those that changed treatment and hence provide an improved estimate for the differential treatment benefit. The usage of these curves is exploited for deriving more accurate information on the effects of two initially randomized treatments, making use of the whole history of the patients.

51

Abstracts (�Student Paper Award) Varying Coefficient Models for Modeling Diffusion Tensors Along White Matter Fiber Bundles � Ying Yuan, Hongtu Zhu, Martin Styner, John H. Gilmore and J. S. Marron University of North Carolina at Chapel Hill [email protected] Diffusion tensor imaging (DTI) provides important information on tissue structure and orientation of major fiber bundles in brain white matter in vivo. It results in a three dimensional grid of tensors, which are 3 × 3 symmetric positive definite (SPD) matrices. This paper develops a functional data analysis framework to model diffusion tensors along fiber bundles as functional responses with a set of covariates of interest, such as age, diagnostic status and gender. This framework has a wide range of clinical applications including the characterization of normal brain development, the neural bases of neuropsychiatric disorders, and the joint effects of environmental and genetic factors on white matter fiber bundles. A challenging statistical issue is how to appropriately handle diffusion tensors along fiber bundles as functional data in a Riemannian manifold. We propose a statistical model with varying coefficient functions, called VCTF to characterize the dynamic association between functional SPD matrix-valued responses and covariates. We calculate a weighted least squares estimation of the varying coefficient functions under the Log-Euclidean metric in the space of SPD matrices. We also develop a global test statistic to test specific hypotheses about these coefficient functions and construct their simultaneous confidence bands. Simulated data are further used to examine the finite sample performance of the estimated varying coefficient functions. We apply our VCTF to study potential gender differences and find statistically significant aspect of the development of diffusion tensors along the right internal capsule tract in a clinical study of neurodevelopment.

Session 18: Mixture Models A New Nuisance Parameter Elimination Method with Application to Unordered Homologous Chromosome Pairs Problem � Pengfei Li1 and Jing Qin2 1 University of Alberta 2 National Institute of Allergy and Infectious Diseases [email protected] Motivated by applications of case-control model or exponential tilting model in the unordered homologous chromosome pairs in genetic studies and in the interim analysis in double blinded clinical trials, we develop a new nuisance parameter elimination method based on the empirical Shannon’s mutual information. The asymptotic behaviors of the maximum empirical Shannon’s mutual information estimation and the empirical Shannon’s mutual information test are similar to the maximum likelihood estimation and the likelihood ratio test, respectively. Interestingly we have found a connection between the empirical Shannon’s mutual information and the profile empirical likelihood (Owen 1988) under some constraints. For testing the null hypothesis that the unordered pairs come from the same distribution, the maximum Shannon’s mutual information estimation has a degenerate information matrix. As a result we have to expand the empirical Shannon’s mutual information test statistic up to the fourth order for finding the limiting distribution of the mixture of a distribution with point mass at zero and a chi-square distribution with one degree freedom. A real genetic data set is employed for illustration.

52

Nonparametric Estimation in Multivariate Mixture Models Hsiao-Hsuan Wang1 , � Yuejiao Fu1 and Jing Qin2 1 York University 2 National Institute of Allergy and Infectious Diseases, Biostatistics Research Branch [email protected] We consider a special multivariate mixture model which is motivated by the application to social sciences where repeated measurements are often available for each subject. We investigated an empirical likelihood estimation in a multivariate two-component mixture model. We treated three-variate mixtures in detail and extended our method to high-dimensional mixtures. In a similar spirit of the composite likelihood, we used the likelihood of some special selected data to do approximate inference. The efficiency of the method was demonstrated through a real data example as well as simulation studies. On Exchangeability and Mixtures of Normals Xinxin Jiang Suffolk University [email protected] In this talk, I’ll make reasonable connection between exchangeability and mixtures of normals. Some statistical testing questions (such as, how to test exchangeability v.s. independence) will be proposed. In the later part of the talk, I’ll introduce a special type of mixture of normals, so-called q-Gaussians in statistical mechanics, and some interesting testing problems related to this type of distribution under exchangeable environment. (Generating q-gaussian r.v.’s, finding maximum likelihood estimate for q, and testing q-gaussians under exchangeability.) Lack of Sufficiency in Mixture Problems Yongzhao Shao New York University [email protected] We discuss the issue of a lack of sufficiency in inference for the number of components in finite mixture mixtures. Some local likelihood based methods for the inference problem are suggested, and some theoretical and numerical results are provided.

Session 19: Design and Analysis of Biomedical Studies Application of Bayesian Methods with Singular Value Decomposition in Genome-Wide Association Study Soonil Kwon and � Xiuqing Guo Cedars-Sinai Medical Center [email protected] Genetic variants over the entire genome are evaluated in genomewide association studies (GWAS) to identify susceptibility genes for disease development. Several hundred thousand to millions (m) of single nuclear polymorphism (SNP) are studied on a relative small number of study participants (n), which lead to the situation when m¿¿n that is unable to be analyzed by traditional regression and/or logistic regression methods. Current GWAS analysis is usually implemented via analyzing one SNP at a time, which leads to a huge multiple testing problems. To analyze all SNPs simultaneously when m¿¿n, we developed the Bayesian classification with singular value decomposition (BCSVD) method for qualitative traits, and Bayesian regression with SVD (BRSVD) method for quantitative traits, respectively. Both methods achieves the massive dimension reduction through SVD and fit the models by Markov chain Monte Carlo with Gibbs sampler, which can be constructed from the posterior densities driven with non-informative and conjugate priors for ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts BCSVD and BRSVD, respectively. Permutation test was incorporated to obtain empirical p-values. Applying the BCSVD method, we successfully identified the genetic variants that are associated with Rheumatoid Arthritis (RA) status in the simulated RA data provided by the Genetic Analysis Workshop 15. Using the Genetic Analysis Workshop 17 sequence data, we showed that the BRSVD method works better than the single SNP association test and/or the penalized regression method for the evaluation of association between a quantitative trait and sequence data. In conclusion, we showed that the BCSVD and BRSVD methods are useful tools for identifying genetic determinants in GWAS with relative small to modest sample size. Analyze Multivariate Phenotypes in Genetic Association Studies by Combining Univariate Association Tests Qiong Yang Boston University [email protected] Multivariate phenotypes are frequently encountered in genomewide association studies(GWAS). Such phenotypes contain more information than univariate phenotypes, but how to best exploit the information to increase the chance of detecting genetic variant of pleiotropic effect is not always clear. Moreover, when multivariate phenotypes contain a mixture of quantitative and qualitative measures, limited methods are applicable. In this paper, we first evaluated the approach originally proposed by O’Brien and by Wei and Johnson that combines the univariate test statistics and then we proposed two extensions to that approach. The original and proposed approaches are applicable to a multivariate phenotype containing any type of components including continuous, categorical and survival phenotypes, and applicable to samples consisting of families or unrelated samples. Simulation results suggested that all methods had valid type I error rates. Our extensions had a better power than O’Brien’s method with heterogeneous means among univariate test statistics, but were less powerful than O’Brien’s with homogeneous means among individual test statistics. All approaches have shown considerable increase in power compared to testing each component of a multivariate phenotype individually in some cases. We apply all the methods to GWAS of serum uric acid levels and gout with 550,000 SNPs in the Framingham Heart Study. Sample Size Analysis for Pharmacogenetic Studies � Chi-hong Tseng1 and Yongzhao Shao2 1 University of California, Los Angeles 2 New York University [email protected] Pharmacogenetic studies identify the genetic factors that in uence the inter-subject variation in drug response. This paper proposes a general framework to determine sample size in pharma- cogenetic studies. Simple closed form solutions for the sample size are derived for continuous and binary outcomes. To extend the application to pharmacogenomic studies, where a large number of genetreatment interactions are evaluated simultaneously, we advocate the use of false discov- ery rate (FDR) in controlling false positive proportion. We adapt the method proposed by Shao and Tseng (2007) to facilitate adjustment for correlation among multiple tests for better control of false positives and power. A real example is given and simulation studies are carried out to demonstrate the performance of the proposed method. Estimating Transitional Probabilities of Disease Status in Longitudinal Studies with Two-Phase Sampling Sujuan Gao ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Indiana University [email protected] Many large cohort studies on Alzheimer’s Disease use the twophase sampling design to identify subjects with higher probability of dementia and cognitive impairment (CI) during the screening phase followed by clinical assessments in a sub-sample at the second phase. While the two-phase design is efficient at identifying subjects with dementia and CI, it introduces statistical challenges in estimating disease transition rates so that adjustment has to be made to account for the sampling schemes. We propose a multi-state Markov model which models transitional probabilities from normal cognitive state, to cognitive impairment, and to dementia. Transitional probabilities were modeled using subjects’ demographics and cognitive scores obtained during the first phase at each follow-up evaluation. The EM algorithm was used to in obtaining maximum likelihood estimates with missing clinical assessment data at the second phase. Predicted transitional probabilities were used to derive age specific incidence estimates. The Jackknife resampling method was used to derive confidence interval for the estimated incidence. We illustrate our methods using data from the Indianapolis-Ibadan Dementia Project.

Session 20: Bridging and multi-regional clinical trials Statistical Challenges and Lessons Learned from MultiRegional Trials Daphne T.Y. Lin U.S. Food and Drug Administration, CDER/OB/Division of Biometrics IV [email protected] With globalization of drug development, large multi-regional clinical trials are becoming the common approaches. These trials present considerable challenges in quality, design, implementation, analysis, and interpretation. With different intrinsic and extrinsic factors and different health authority requirements, there is a clear need to develop a road map in order to reach successful completion of a quality multi-regional clinical trial. In this talk, we will discuss several new drug applications (NDAs) which were recently presented to the FDA Advisory Committees. It is very challenging to interpret the trial results if the consistent treatment effects across region were not observed. For example, if the treatment effect is driven by data from one region which represents only 10% of the whole dataset, or if the treatment effect is driven by non-US data only. The case studies will be discussed in details and explore possible reasons for the regional treatment difference. Lessons learned from these case examples with some suggestions for future study design and implementation will be shared. Establishing Consistency Across All Regions in a MultiRegional Clinical Trial � Chin-Fu Hsiao1 , Hsiao-Hui Tsou1 , H.M. James Hung2 , Yue-Ming Chen1 , Wong-Shian Huang1 and Wan-jung Chang1 1 National Health Research Institutes, Taiwan 2 U.S. Food and Drug Administration [email protected] In recent years, global collaboration has become a conventional strategy for new drug development. To accelerate the development process and shorten approval time, the design of multi-regional trials incorporates subjects from many countries/regions around the world under the same protocol. After showing the overall efficacy of a drug in a global trial, one can also simultaneously evaluate the

53

Abstracts possibility of applying the overall trial results to all regions and subsequently support drug registration in each region. However, most of the recent approaches developed for the design and evaluation of multi-regional clinical trials (MRCTs) focus on establishing criteria to examine whether the overall results from the MRCT can be applied to a specific region. In this paper, we use the consistency criterion of Method 1 from the MHLW guidance to assess whether the overall results from the MRCT can be applied to all regions. Sample size determination for the MRCT is also provided to take all the consistency criteria from each individual region into account. Numerical examples are given to illustrate applications of the proposed approach. Evaluation of Regional Treatment Effect � Yi Tsong1 , W-J Chang2 , Xiaoyu Dong3 and Hsiao-Hui Tsou2 1 U.S. Food and Drug Administration 2 National Health Research Institutes, Taiwan 3 University of Maryland Baltimore County [email protected] The primary objective of a multiregional trial is to demonstrate efficacy of a test treatment in participating regions overall while also establish the treatment efficacy at each region of interest. In order to do that Japanese regulatory authority MHLW recommended two methods to determine the sample size requirement for the region of interest. The consistency method requires a regional sample size for a large probability of showing consistency of the sample mean treatment effect between the region of interest and the study overall. On the other hand, the regional efficacy method requires a regional sample size for a large probability of showing trend of efficacy based on regional sample mean. In this presentation, we propose modified approaches of consistency and efficacy methods proposed in MHLW. With a modified consistency method, we propose to test regional consistency and region efficacy based on testing with type I error rate adjusted for pre-specified power and regional sample size. We will illustrate the approach with examples and sample size determination.

Session 21: Novel Approaches to the Genetic Dissection of Complex Traits Joint Analysis of Binary and Quantitative Traits with Missing Data � Gang Zheng1 , Colin O Wu1 , Wenhua Jiang2 , Jungnam Joo3 and Joao AC Lima2 1 National Heart Lung and Blood Institute 2 The Johns Hopkins University School of Medicine 3 National Cancer Center, Korea [email protected] We propose two tests for association with both binary (case-control) and quantitative (continuous) traits where some quantitative traits are missing. Under the alternative hypothesis, the missing is not at random. We propose a modified F test by incorporating the genotype frequencies of those whose traits are missing. We show that this modified F-test consists of the usual F-test for the quantitative trait data and the trend test for the binary data. Combining the correlated p-values of Pearson’s chi-squared test and the modified F test is also considered. The two proposed tests are compared with an existing joint analysis in simulations, which show the proposed tests have significantly greater power. Application to a real study of rheumatoid arthritis is presented.

54

A Bayesian Model for Modeling Gene-Environment Interaction � Kai Yu1 and Faming Liang2 1 National Cancer Institute 2 Texas A&M University [email protected] Once a gene or a chromosome region has been identified to be associated with a disease outcome, it is often of great interest to further understand how the gene and the environmental factor act together modifying the disease risk. Since we usually do not know which genetic markers within the gene/region are the functional ones, it is not obvious how to study the joint effect between the gene and the environmental factor. Here we present a flexible Bayesian model that tries to model the joint effect by synthesizing information from multiple genetic markers into a latent multi-level categorical variable. We evaluate the performance of the proposed method through simulation studies, and illustrate its application in a real data. Semi-parametric Pseudo-Maximum-Likelihood Estimation Exploiting Gene-Environment Independence for PopulationBased Case-Control Studies with Complex Sampling � Yan Li1 and Barry Graubard2 1 The University of Texas at Arlington 2 National Cancer Institute [email protected] Advances in human genetics have led to epidemiologic investigations not only of the effects of genes alone, but also of geneenvironment interactions (G-E). A widely accepted design strategy in the study of how G-E relate to disease risks is the populationbased case-control study (PBCCS). For simple random samples semiparametric methods for testing G-E have been developed by Chatterjee and Carroll (CC) in 2005. The use of complex sampling in PBCCS is becoming common. Two complexities, weighting and intracluster correlations of observations, are induced by the sampling. We develop pseudo semiparametric-maximum-likelihood estimators (pseudo-SPMLE) that extend the CC method to the PBCCS with complex sampling. We study the finite sample performance of the pseudo-SPMLE using simulations and illustrate the pseudoSPMLE with a case-control study of kidney cancer conducted in Detroit. Semiparametric Maximum Likelihood Methods for Estimating Genetic and Environmental Effects wtih Case-Control MotherChild Pair Data � Jinbo Chen1 , Dongyu Lin1 and Hagit Hochner2 1 University of Pennsylvania 2 Hebrew University [email protected] Case-control mother-child pair design represents a unique advantage for dissecting genetic susceptibility of complex traits because it allows the assessment of both maternal and offspring genetic compositions. This design has been widely adopted in studies of obstetric complications and neonatal outcomes. We developed efficient statistical methods for evaluating joint genetic and environmental effects on the risk of the phenotype. Adopting a logistic regression model to relate the phenotype to maternal and offspring genetic and environmental risk factors, we developed semiparametric maximum likelihood methods for the estimation of the odds ratio association parameters. Our methods are novel because they exploit two unique features of the study data for the parameter estimation. First, the correlation between maternal and offspring SNP genotypes is defined by the minor allele frequency. Second, environmental exposures are usually maternal and thus not affected by offspring genes. Our methods yield more efficient estimates compared with standard ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts prospective methods for fitting logistic regression models to casecontrol data. We demonstrated the performance of our methods through extensive simulation studies and analysis of data from the Jerusalem Perinatal Study.

Session 22: Non-/Semi-Parametric Models for Complex Data Threshold Estimation Based on a P-Value Framework � Atul Mallik1 , Bodhi Sen2 , Moulinath Banerjee1 and George Michailidis1 1 University of Michigan 2 Columbia University [email protected] We use p-values as a discrepancy criterion for identifying the threshold level at which a regression function takes off from its baseline value—a problem motivated by applications in toxicological and pharmacological dose-response studies and environmental statistics. We study the problem in two different sampling settings: one where multiple responses can be obtained at a number of different covariate-levels and the other being the standard regression setting (limited number of response values at each covariate). Our procedure involves testing the hypothesis that the regression function is at its baseline at each covariate value and then computing the (potentially approximate) p-value of the test. An estimate of the threshold is obtained by fitting a piecewise constant function with a single jump discontinuity (stump) to these observed p-values (or their surrogates), as they behave in markedly different ways on the two sides of the threshold. The estimate is shown to be consistent and its large sample properties are studied. Our approach is computationally simple and extends to the estimation of the baseline value of the regression function, heteroscedastic errors and to time series. It is illustrated on some real data applications. An M-Theorem for Bundled Parameters, with Application to the Efficient Estimation in a Linear Model for Cencored Data � Ying Ding1 and Bin Nan2 1 Eli Lilly and Company 2 Department of Biostatistics, University of Michigan [email protected] In many semiparametric models that are parameterized by two types of parameters—an Euclidean parameter of interest and an infinitedimensional nuisance parameter, the two parameters are bundled together, i.e., the nuisance parameter is an unknown function that contains the parameter of interest as part of its argument. For example, in a linear regression model for censored survival data, the unspecified error distribution function involves the regression coefficients. Motivated by developing an efficient estimating method for the regression parameters, we propose a general M-theorem for bundled parameters and apply the theorem to deriving the asymptotic theory for the sieve maximum likelihood estimation in the linear regression model for censored survival data. The numerical implementation of the proposed estimating method can be achieved through the conventional gradient-based search algorithms such as the NewtonRaphson algorithm. We show that the proposed estimator is consistent and asymptotically normal and achieves the semiparametric efficiency bound. Simulation studies demonstrate that the proposed method performs well in practical settings and yields more efficient estimates than existing estimating equation based methods. Illustration with a real data example is also provided. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Consistent Model Selection for Marginal Generalized Additive Model for Correlated Data Lan Xue1 , Annie Qu2 and � Jianhui Zhou3 1 Oregon State University 2 University of Illinois at Urbana-Champaign 3 University of Virginia [email protected] We consider the generalized additive model when responses from the same cluster are correlated. Incorporating correlation in the estimation of nonparametric components for the generalized additive model is important since it improves estimation efficiency and increases statistical power for model selection. In our setting, there is no specified likelihood function for the generalized additive model since the outcomes could be non-normal and discrete, which makes estimation and model selection very challenging problems. We propose consistent estimation and model selection which incorporate the correlation structure. We establish an asymptotic property with L 2-norm consistency for the nonparametric components, which achieves the optimal rate of convergence. In addition, the proposed model selection strategy is able to select the correct generalized additive model consistently. That is, with probability approaching to 1, the estimators for the zero function components converge to 0 almost surely. We will illustrate our method using numerical studies with both continuous and binary responses, and a real data application of binary periodontal data. Analysis of Disease Progression Data via Progressive MultiState Models under Nonignorable Inspection Processes Baojiang Chen1 , � Grace Yi2 and Richard Cook2 1 University of Nebraska Medical Center 2 University of Waterloo [email protected] Irreversible multi-state models provide a convenient framework for characterizing disease processes that arise when the states represent the degree of organ or tissue damage incurred by a progressive disease. In many settings, however, individuals are only observed at periodic clinic visits and so the precise times of the transitions are not observed. If the life history and observation processes are not independent, the observation process contains information about the life history process, and the likelihood inference based on the disease process alone is often invalid. This talk concerns the analysis of data from progressive multi-state disease processes in which individuals are scheduled to be seen at periodic pre-scheduled assessment times. We cast the problem in the framework used for incomplete longitudinal data problems. Maximum likelihood estimation via an EM algorithm is advocated for parameter estimation. Numerical studies will be presented to assess the performance of the proposed method, and data from a cohort of patients with psoriatic arthritis will be analyzed.

Session 23: Time to Event Data Analysis Comparison of Two Crossing Hazard Rate Functions Peihua Qiu University of Minnesota [email protected] Motivated by a clinical trial in which zinc nasal spray is the primary treatment for viral respiratory illness, we consider comparison of two hazard rate functions which may cross each other. A number of existing procedures for handling this problem only consider the alternative hypothesis with crossing hazard rates; many other realistic cases are excluded from consideration. We propose a two-stage

55

Abstracts procedure that considers all possible alternatives, including the ones with crossing or running parallel hazard rate functions. To define its significance level and p-value properly, a new procedure for handling the crossing hazard rates problem is suggested, which has the property that its test statistic is asymptotically independent of the test statistic of the logrank test. We show that the two-stage procedure, with the logrank test and the suggested procedure for handling the crossing hazard rates problem used in its two stages, performs well in applications. This is a joint research with Dr. K. Liu and Dr. J. Sheng. Robust Parameter Estimation in a Semiparametric Model for Case-Control Studies � Jingjing Wu1 , Rohana Karunamuni2 and Biao Zhang3 1 University of Calgary 2 University of Alberta 3 The University of Toledo [email protected] We investigate the estimation problem of parameters in a twosample semiparametric model. Specifically, let X 1, dots, X n be a sample from a population with distribution function G and density function g. Independent of the X i’s, let Z 1, dots, Z m be another random sample with distribution function H and density function h(x) = exp[alpha + r(x)η]g(x), where alpha and η are unknown parameters of interest and g is an unknown density. This model has wide applications in the logistic discriminant analysis, survival analysis, case-control studies, and receiver operating characteristic curves analysis. Furthermore, it can be considered as a biased sampling model with weight function depending on unknown parameters. In this paper, we construct minimum Hellinger distance estimators of alpha and η. The proposed estimators are chosen to minimize the Hellinger distance between a semiparametric model and a nonparametric density estimator. Theoretical properties such as the existence, strong consistency and asymptotic normality are investigated. Robustness of proposed estimators is also examined using a Monte Carlo study. Response Adaptive Randomization for Delayed Survival Outcome with a Short-term Outcome � Mi-Ok Kim1 , Chunyan Liu1 and Jack Lee2 1 Cincinnati Children’s Hospital Medical Center 2 The University of Texas MD Anderson Cancer Center [email protected] We consider an application of response-adaptive randomization (RAR) design in a clinical trial where the primary endpoint takes a long time to observe but a short-term “surrogate”ˇt outcome is available. The asymptotic properties of the design have been shown little affected when the delay is not very long relative to the entry time intervals (Bai et al, 2003; Biswas, 2003; Hu et al, 2008). These theoretical results, however, are not applicable when the delay is long or delay mechanisms differ by the short-term outcomes as in many survival trials. Huang et al (2009) recently proposed a Bayesian approach of modeling the relationship between the surrogate and the primary response. We first show that an adaptive design without accounting for the short-term outcome is biased when delay mechanisms differ by the short-term outcome. We then study how the delay affects the RAR design and when utilizing the short-term outcome is more beneficial by comparing Huang et al (2009)’s joint modeling approach with an approach without utilizing the shortterm outcome. Simulation results show that the more complex Huang et al’s approach performs better when the delay is longer and the surrogate is informative.

56

The “Modified Covariate” Method for Detecting Interactions between a Treatment and a Large Number of Predictors � Lu Tian, A. Alizadeh, A. Gentles and R. Tibshirani Stanford University [email protected] We consider a setting in which we have a treatment and a large number of covariates for a set of observations, and wish to model their relationship with an outcome of interest. We propose a simple method for modeling interactions between the treatment and covariates. The idea is to modify the covariates in a simple way, and then fit a standard model using the modified covariates and no main effects. We show that this method produces valid inferences in a variety of settings. It can be useful for personalized medicine: determining from a large set of biomarkers the subset of patients that can potentially benefit from a treatment. We apply the method to both simulated datasets and to gene expression studies of cancer. The modified data can be used for other purposes, for example large scale hypothesis testing for determining which of a set of covariates interact with a treatment variable.

Session 24: Adaptive Design in Clinical Trials Subset Selection for Comparative Clinical Selection Trials Cheng-Shiun Leu, Ying-Kuen Cheung and � Bruce Levin Department of Biostatistics, Columbia University [email protected] When several treatment regimens are possible candidates for a large phase III study, but too few resources are available to evaluate each relative to a standard, conducting a multi-arm randomized selection trial is a useful strategy to remove inferior treatments from further consideration. In this talk we discuss a class of sequential procedures designed to select a subset of treatments that offer clinically meaningful improvements over the control group, or to declare that no such subset exists. The proposed procedures are easy to implement, allow sequential elimination of inferior treatments and sequential recruitment of promising treatments while preserving the type I error rate. Inference Following Adaptive Biased Coin Designs Steve Coad Queen Mary, University of London [email protected] Suppose that two treatments are being compared in a clinical trial. Then, if complete randomisation is used, the next patient is equally likely to be assigned to one of the two treatments. So this randomisation rule does not take into account the previous treatment assignments, responses and covariate vectors, and the current patient’s covariate vector. The use of an adaptive biased coin which takes some or all of this information into account can lead to a more powerful trial. The different types of such designs which are available are reviewed and the consequences for inference discussed. Issues related to both point and interval estimation will be addressed. On an Adaptive Procedure for Selecting among Bernoulli Populations Pinyuen Chen Syracuse University [email protected] This talk considers an adaptive procedure for selecting among a set of Bernoulli populations in terms of their probabilities of success. We use the idea of strong curtailment to define a hybrid selection and testing procedure for finding the best among several experimental Bernoulli populations, provided that it is better than ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts a control. Our procedure is based on the fixed-sample-size selection and testing procedure proposed by Thall, Simon, and Ellenberg (1988) Two-Stage Selection and Testing Designs for Comparative Clinical Trials, Biometrika, 75, 303-310. The curtailment procedure achieves the same power as the fixed-sample-size procedure for any significance level with a smaller total number of observations and a smaller expected number of observations under broad parameter configurations. The derivations of the probability of a correct selection and the least favorable configurations are exact, without making use of the normal approximation to binomial. Comparisons with the previous work are made through exact calculations and simulation. The Levin-Robbins-Leu Random Subset Size Selection Procedure � Cheng-Shiun Leu and Bruce Levin Columbia University [email protected] We introduce a family of sequential selection procedures for random subset size identification problems in binomial populations. We discuss how to design an experiment using such procedures in order to guarantee the probability of correct selection and/or acceptable selection to the required level, say P*. Unlike S.S. Gupta’s subset selection procedure, the new procedure can control the probabilities of correct selection for subsets of size greater than one. Simulation studies are also presented to compare the proposed procedure to other procedures.

Session 25: Model Selection and Its Application in Clinical Trial Design A Non-Inferiority Trial Design without the Need for a Conventional Margin � Xi Chen1 and Hamparsum Bozdogan2 1 PharmClint Co. 2 University of Tennessee, Knoxville [email protected] The model selection methodology in information theory has been introduced into the design of the non-inferiority (NI) trial. The new trial set up eliminates the dependency on the conventional NI margin, and it explicitly uses the minimum clinical important difference (MCID) that links the statistical analysis to the clinical sense. Different from the conventional trial design, the new methodology is self-adaptive to the change in the sample size and overall cure rate, and it has an asymptotic property. It is shown that MCID is de-composite into constant MCID and statistical MCID, and along with this concept, the interpretation of the trial result is more accurate and consistent to the statistical theory as well as the clinical interpretations. The model selection methodology has revived the concept of equivalence in confirmative trial set up. Besides, along with this methodology, given the sample size to be large enough to cover the variation of the pretrial assumptions, the blinded interim analysis to adjust sample size for NI trial may no longer be necessary. The concept of the overpowered study is not applicable to this trial design. Shifting Model and Its Application in Unified Clinical Trial Analysis Xi Chen1 and � Jie Tang2 1 PharmClint Co. 2 Pfizer Inc. [email protected] ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Along with a non-zero constant MCID (minimum clinical important difference), the concepts of the strict clinical trial outcomes have been expanded into the general trial outcomes, including aggressive and conservative claims. These new trial outcomes are able to serve as trial goal or to provide additional information to the trial goal. Besides, the concepts of trial goal and trial outcomes have been separated, and it is shown that the trial goal may not be the most accurate interpretation of the trial outcome. The combination of the test result to trial goal and the observed trial outcome may provide more objective interpretation of the trial result. Time to Conclusion for Phase II Cancer Trials and Its Implication to Trial Design � 1

Ying Lu1 and Shenghua Kelly Fan2

VA Palo Alto Health Care System and Stanford University

2

California State University, East Bay [email protected] Phase II cancer trials evaluate total response (TR) rate using a single-arm open-label design. Simon’s two-stage design minimizes either the expected or the maximum number of patients. In this paper, we investigate distribution of time to reaching trial conclusion of such two-stage trials, which considers the accrual rate, time to TR, futility, and efficacy decision rules. We derive recursive formulas and computer simulation algorithm for the distribution of both one and two-stage phase II designs. We further develop an optimal design that balances the needs of concluding the trial findings within required time period and minimizing the expected sample size. In conclusion, a consideration of the time to reaching the trial conclusion can lead to an optimal design with a minimum increase in the expected number of patients but an increased confidence to reaching study conclusions within required time. (�Student Paper Award) Exact Meta-Analysis Approach for the Common Odds Ratio of 2 × 2 Tables with Rare Events �

Dungang Liu, Regina Liu and Minge Xie

Rutgers University [email protected] In the analysis of a number of similar studies with rare events, meta-analysis is often the only approach to drawing reliable inference. However, there remain some concerns about the current metaanalysis approaches when events rates are extremely low and a nonnegligible portion of the studies may have zero events. Conventional approaches either exclude such studies from the analysis or arbitrarily add some positive corrections to these zero events, both of which are known have undesired impact on inference. In this paper, we propose an exact meta-analysis approach for the common odds ratio, which can incorporate all 2 × 2 tables in the analysis and without using artificial continuity corrections. The idea of this approach is to combine significance functions obtained from the exact test results associated with each of the 2 × 2 tables, which is different from conventional approaches of combining point estimates. Our approach can handle the so-called zero total studies and reflect the appreciable difference between the impacts from zero total studies of large versus small trials, for example, zero event out of 1000 cases and 1000 controls versus zero event out of 10 cases and 10 controls. We show that our approach is valid for exact inference and is efficient in large sample settings. Numerical studies using both simulated and real data show that our proposed approach is superior to Mantel-Haenszel method and Peto’s method in the presence of rare events.

57

Abstracts Session 26: Challenges in Comparative Effectiveness Research Reflections on Heterogeneity and Exchangeability in Comparative Effectiveness Research Demissie Alemayehu Pfizer Inc. [email protected] In Comparative Effectiveness Research (CER), indirect and mixed treatment comparison procedures are commonly used due to the unavailability of head-to-head comparative data from randomized clinical trials for competing treatment options. However, implicit in the available indirect comparison techniques is an assumption of exchangeability, which in practice cannot be conclusively verified. Also implicit in these approaches is the supposition of consistency of results across patient subgroups and studies. When the assumptions are not satisfied, the interpretation of the results may be quite challenging, and often controversial. In view of the considerable implication on public health, the issues have been a focus of considerable research. We evaluate the consequences of violations of these assumptions in the context of CER, and discuss steps that may be taken to minimize the impacts on conclusions drawn from such studies. Taking a More Global Perspective in Choosing from Amongst Multiple Research Designs in the Conduct of Comparative Effectiveness Research Martin J. Zagari Amgen Inc. [email protected] Comparative Effectiveness Research (CER) has been proposed as a means to compare the relative benefits and harms for two or more alternative medical treatments. Despite the emergence of CER as a concept in the United States within the past few years, CER has been employed by numerous publically-funded health care systems the world over. Further, other organizations such as the EMEA (Europe’s equivalent of the US FDA) have proposed other comparative evaluative frameworks, the most widely publicized being the proposed “Relative Effectiveness” (RE) program amongst European Member states. The methods employed by CER are already familiar to most researchers and include dedicated active-control RCTs, adaptive trials, “practical“ clinical trials, cluster randomized trials, natural experiments, analytical techniques such as meta-analysis and indirect comparison (network meta-analysis), and observational research. Each of these study designs has advantages and disadvantages. Among these, the use of observational research has generated both a high degree of interest and controversy. On the interest side are the large volumes of data that might be available for low costs compared to clinical study data. On the controversial side is the greater likelihood that observational data contain both observed and unobserved bias. We will explore CER in the context of other agencies and programs in the global healthcare marketplace that compare medical treatments for value and risk-benefit. We also address factors to consider when choosing from amongst various CER methods and offer case studies for several of these methods. Strengths and Limitations of Administrative Healthcare Databases in Comparative Effectiveness Research Jesse Berlin Johnson & Johnson [email protected] The purpose of Comparative Effectiveness Research (CER) is to assist consumers, clinicians, purchasers, and policy makers to make

58

informed decisions that will improve health care at both the individual and population levels. There is ongoing discussion as to what types of evidence are appropriate to inform CER, and how best to interpret various forms of evidence. The availability of patient-level data in large databases of health insurance claims and electronic medical records, presents the potential to address CER questions with large sample sizes in actual clinical practice. Several key challenges arise related to the use of non-randomized studies in general, and to the use of administrative databases, in particular, because these databases were created for other purposes than research. For example, while exposure information is available regarding either prescription or dispensing of prescription pharmaceutical products, actual use of the medicines is not captured. Information on important potential confounding variables may be missing, as is information on over-the-counter drug use. These systems are also not designed to collect data on patient-reported outcomes (e.g., symptoms, functional status), whereas they are much better at recording clinical events, such as myocardial infarction, or hospitalizations. Large administrative healthcare databases may be well-suited to addressing some questions, but their use needs to be carefully evaluated in the context of CER. Non-Inferiority Trials and Their Relation to Indirect and/or Mixed Treatment Comparisons Steven Snapinn Amgen Inc. [email protected] As the relatively new field of indirect and/or mixed treatment comparisons evolves, it could be instructive to learn from the experience with non-inferiority trials, since they have some key issues in common. Most notably, non-inferiority trials provide a direct comparison between and experimental treatment and a control, but an indirect comparison between the experimental treatment and placebo is often important as well. The constancy assumption is well known to complicate the interpretation of non-inferiority trials, but it appears by another name in the context of indirect comparisons. In this presentation I will review the issues common to these two classes of trials, and the approaches taken in the context of non-inferiority trials.

Session 27: Challenges and Developments in Survival Analysis Extension of Cure Rate Model When Cured Is Partially Known � Yu Wu1 , Yong Lin2 , Shou-En Lu2 and Weichung J. Shih2 1 K&L Consulting Services Inc. 2 University of Medicine and Dentistry of New Jersey [email protected] When there is evidence of long term survivors, cure rate models have been used by researchers to model the tail behavior of the survival curve. Mixture models were traditionally used, and different parameter estimation approaches for parametric and semiparametric models have been suggested. A common aspect of the traditional cure rate models is that they implicitly assume there is no additional information about the status of cure, thus the indicator of cure has been modeled as latent variable. This assumption is not entirely valid in many cases, when some diagnostic procedure can provide information about the status of cure. This dissertation proposes a novel extension to incorporate the additional information about status of cure in the cure rate models. It also shows that, with this additional information, more efficient estimator can ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts be obtained. The efficiency gain increases with better sensitivity and specificity of the diagnostic procedure. The efficiency gain is larger when the censoring rate is high. This extension can be applied when the latency part is modeled parametrically, semi-parametrically, or non-parametrically. Both proportional hazards (PH) cure rate models and accelerated failure time (AFT) cure rate models can use this model extension. Simulation study and a case study results are presented. A Semiparametric Transformation Cure Model for Interval Censoring � Man-Hua Chen1 and Chen-Hsin Chen2 1 Tamkang University 2 Academia Sinica, Taiwan [email protected] In the conventional analysis of right-censored failure time, its underlying assumption is that all the study subjects are susceptible to the event. A semiparametric transformation cure model for interval censoring is studied for the analysis of failure time with long-term survivors and the observed failure time falls into a certain interval. It combines a logistic regression for the probability of event occurrence with the general class of semiparametric transformation models for the interval time of occurrence. Insights on the Robust Variance Estimator Under RecurrentEvents Model � Hussein Al-Khalidi1 , Yili Hong2 , Thomas Fleming3 and Terry Therneau4 1 Duke University 2 Virginia Polytechnic Institute and State University 3 University of Washington 4 Mayo Clinic [email protected] Recurrent events are common in medical research for subjects who are followed for the duration of a study. For example, cardiovascular patients with an implantable cardioverter defibrillator (ICD) experience recurrent arrhythmic events which are terminated by shocks or anti-tachycardia pacing delivered by the device. In a published randomized clinical trial, a recurrent-event model was used to study the effect of a drug therapy in subjects with ICDs who were experiencing recurrent symptomatic arrhythmic events. Under this model, one expects the robust variance for the estimated treatment effect to diminish when the duration of the trial is extended, due to the additional events observed. However, as shown in this paper, that is not always the case. We investigate this phenomenon using large datasets from this arrhythmia trial and from a diabetes study, with some analytical results, as well as through simulations. Some insights are also provided on existing sample size formulae using our results. Bayesian Transformation Models for Multivariate Survival Times � Mario de Castro1 , Ming-Hui Chen2 and Joseph G. Ibrahim3 1 University of Sao Paulo 2 University of Connecticut 3 University of North Carolina at Chapel Hill [email protected] For the analysis of censored survival data, the Cox proportional hazards model is very popular among practitioners, but in many real life applications the proportionality of the hazard ratios may not be a tenable assumption. In this paper, we develop the Bayesian methodology to carry out inference for a class of transformation models for multivariate failure times. A piecewise exponential model ICSA Applied Statistics Symposium 2011, NYC, June 26-29

is assumed for the baseline hazard function for each failure time and a nice theoretical connection is established between the cure rate and the piecewise hazards. Our proposed Bayesian method allows the transformation parameters to be estimated together with the coefficients of the covariates and the parameter of the frailty distribution. In addition, by introducing several sets of latent variables, an efficient Markov chain sampling algorithm is developed to sample from the posterior distribution. An extensive simulation study is conducted in order to assess some frequentist properties of the Bayesian estimators as well as to examine the performance of Bayesian model comparison criteria such as the deviance information criterion (DIC) and the conditional predictive ordinate (CPO). The proposed methodology is further illustrated using a real data set from a cardiovascular study. A New Nonparametric Estimator for Survival Functions When Censoring Times Are Incomplete � Chung Chang1 and Wei-Yann Tsai2 1 Department of Applied Mathematics, National Sun Yat-sen University 2 Department of Biostatistics, Columbia University [email protected] In the analysis of lifetime data, under some circumstances, censoring times are missing or only known within an interval (e.g., warranty data or medical data). Motivated by such examples, we consider a statistical model in which censoring times are incomplete. In this talk, we will present a new iterative method to obtain a nonparametric estimator of the survival function and conduct a simulation study to discuss its property. A real data application will also be presented.

Session 28: Design and Analysis of Clinical Trials Comparison Study of Different Dose-Finding Designs for Multiple Graded Toxicities in Oncology � Monia Ezzalfani1 , Marie-C`ecile Ledeley1 and Sarah Zohar2 1 Institut Gustave Roussy 2 Institut national de la sant`e et de la recherche m`edicale, Paris [email protected] The aim of a phase I oncology trial is to identify the recommended dose (RD) with an acceptable level of toxicity. Dose-Limiting Toxicity (DLT) is the usual endpoint of dose-finding trials, but by only considering toxicity above a predetermined threshold, the use of DLTs ignores much information about lower levels of toxicity which may be relevant for targeted therapies expected to be less toxic. In the present paper, we choose to highlight and discuss the dose-allocation designs, taking into account multiple toxicities, developed by Yuan et al. (2007), Ivanova and Kim (2009) and Chen et al. (2010). And, we develop an extended of CRM, which treats toxicities as quasi-continuous endpoint, with one-parameter logistic model in frequentist framework. The aim of our work is focused on the comparison of these methods via various simulation studies under multiple relationships between the toxicity measurement and the dose (scenarios). These methods, considering all observed toxicities, differ by the statistical inference and the dose-allocation process: the relationship between toxicity scores and doses is modelled by Bayesian approach for the Yuan’s method, by frequentist for our proposal and the dose-escalation algorithm is derived from ’up-and-down’ methods for Ivanaova and Chen’s methods. In total the different methods present a good performance. The Yuan’s approach seems more performant in terms of the percentage of correct

59

Abstracts selection of RD and the distance between the estimated dose and the theoretical RD.

finding exists and identify the conditions that minimize the imbalance.

Challenges in Non-inferiority Clinical Studies for Medical Devices Xu Yan U.S. Food and Drug Administration [email protected] A non-inferiority study design is commonly used in medical device clinical studies. Unlike non-inferiority studies for drugs, many non-inferiority studies for medical devices are not double-blinded or concurrently controlled. Therefore, they create more challenges for clinicians and statisticians to design such studies and interpret the data. In this presentation, these challenges will be discussed from a statistical reviewer’s prospective. The discussion will be focused on the determination of non-inferiority margin, selection of active control, problem of un-blinding, and noncompliance to treatment.

The Use of Bayesian Hierarchical Models in the Multi-Regional Clinical Trial for Medical Devices Shiling Ruan U.S. Food and Drug Administration [email protected] Multi-regional clinical trials (MRCT) are on the rise not just for the registration and development of new drugs, but also for the development of new medical devices. Multi-regional trials with the same clinical study protocol usually have a faster enrollment rate, cost less, and can provide a basis to facilitate the simultaneous global development of new medical devices. However, MRCT also present a number of challenges. The main question that regulatory reviewers face when a medical device is evaluated is whether there is reasonable assurance of its safety and effectiveness for the intended population. However, in a multi-regional clinical trial, the overall device effect may not be consistent from region to region. In such situation, the determination of device effectiveness becomes problematic. Bayesian hierarchical models provide a natural approach to design and analyze multi-regional clinical trials. The overall effect and the regional effect can be estimated within a unified framework. This talk will discuss the practical challenges in the evaluation of multi-regional clinical trials with the use of Bayesian hierarchical models.

Analysis of Multi-Regional Clinical Trials: Applying a Two-Tier Procedure to Decision-Making by Individual Local Regulatory Authorities � Yunling Xu and Nelson Lu U.S. Food and Drug Administration, CDRH [email protected] In the past decade, the number of multi-regional clinical trials (MRCT) is increasing for medical products development. The popularity for sponsors conducting an MRCT is mainly due to benefits such as cost-efficiency. However, the presence of inherent regional difference in treatment effect poses great challenges to regulatory decision-making for local regulatory authorities. In recent years, there have been methods proposed for planning sample size for MRCT with assumption on consistency of treatment effect across regions and also methods are developed for assessing consistency of treatment effect across regions. In this presentation, we propose a two-tier procedure for analyzing MRCT data with no assumption on consistency of treatment effect across regions. With examples of randomized controlled superiority trials of medical devices, we illustrate how to apply this procedure to regulatory decision-making by individual local regulatory authorities. Sample size planning with this procedure will be also discussed. Impact of Unequal Sample Sizes on Evaluation of Treatment Difference with Discrete Endpoints � Jin Xu and G. Frank Liu Merck & Co., Inc. jin [email protected] Although majority of late-stage clinical trials allocate equal sample sizes to the two groups being studied, there are circumstances in which studies may need to allocate a larger portion of subjects to the test group than to the control group or vice versa. The unequal sample size allocation could lead to unexpected findings in the safety profile evaluation. If the two treatment groups truly have the same risk profile, one could expect that some of these tests, e.g., 5%, will result in “significant“ findings by chance alone and, more importantly, these chance findings would happen equally for or against a group in a random fashion. However, in this presentation we will show that an unequal sample size allocation can lead to imbalanced chance findings against one of the study groups when the endpoints being evaluated are discrete and the event rates are low, such as rare serious adverse events in clinical trials. We calculate the tail probabilities for testing the difference of two proportions using Miettinen and Nurminen’s method, and show why such imbalanced chance

60

Session 29: Law and Statistics An Analysis of the Statistical Summary Submitted by the Investment Bank in the SEC v. Goldman-Sachs Case: Did The Regulators Appreciate the Implications of the Data? Joseph Gastwirth George Washington University [email protected] In its response to charges by the Securities and Exchange Commission that Goldman-Sachs misled investors about the role a hedge fund manager had in selecting a securities portfolio involving subprime mortgages, the firm submitted statistics to support its argument that the losses incurred by the portfolio were due to the broad decline in the market for such securities that occurred in 2007-2008. This paper points out a logical flaw in the comparison of performance measures of the initial portfolio of 86 securities with that of the final or reference portfolio of 90. During the selection process in which the hedge fund manager and the bank participated along with the portfolio selection agent, 20 of the original 86 securities were dropped and 24 others added. The statistical summary compares the average values of several performance characteristics of the two portfolios; however, the fact that 66 securities or approximately 75% were common to both renders that comparison virtually meaningless.A more appropriate statistical assessment of that stage of the selection process focuses on the performance of the 24 new securities included in the final portfolio of 90 relative to that of the 20 that were dropped from the Initial-86. A careful examination of the reported statistics shows that the difference in the average percent write-down of these two sets of securities is at least 18%. Furthermore, under an additional assumption that will require further checking from more detailed data, the average time to write-down of the 24 new securities was statistically significantly less than that of the 20 that were dropped. Similar results were obtained when the final portfolio of 90 was compared with the universe of 293 similar securities described in the bank’s response. Both the average ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts percent write-down and the average time to write-down of the 293 securities were less than that of the Reference-90. How Technology is (Rapidly) Expanding the Scope of the Law in Statistics Victoria Stodden Columbia University [email protected] Massive computation is emerging as central to the scientific enterprise, and one largely unaddressed corollary is the increased expansion of the role of the law in statistical research. Reproducible research, the communication of the code and data used to discover scientific results, is fraught with intellectual property barriers; code produced at the university is under patent-pressure for licensing purposes; recent caselaw arguably bolsters the incentives for opacity in software patent applications; and new vistas of patentability are emerging (ie. statistical devices and data-driven classifiers). This paper discusses the legal landscape facing the discipline of statistics, and argues that the law’s impact is not only counter to our scientific norms but increasingly so with the pervasiveness of computational methods. Perspectives on Meta-Analysis from the Avandia Cases � Michael O. Finkelstein1 and Bruce Levin2 1 Columbia Law School 2 Department of Biostatistics, Columbia University [email protected] Combining the results of multiple small trials to increase accuracy and statistical power, meta-analysis has become well established and increasingly important in medical studies, particularly in connection with new drugs. When the data are sparse, as they are in many such cases, certain accepted practices, applied reflexively by researchers, may be misleading because they are biased and for other reasons. We illustrate some of the problems by examining a meta-analysis of the connection between the diabetes drug Avandia (rosiglitizone) and myocardial infarction that was strongly criticized as misleading, but led to thousands of lawsuits being filed against the manufacturer and the FDA acting to restrict access to the drug. Our scrutiny of the Avandia meta-analysis is particularly appropriate because it plays an important role in ongoing litigation, has been sharply criticized, and has been subject to a more searching review in court than meta-analyses of other drugs. Statistical Properties of Tests Used to Detect Disparate Impact in Discrimination Cases � Weiwen Miao1 and Joseph Gastwirth2 1 Haverford College 2 George Washington University [email protected] In discrimination cases concerning promotion or hiring, courts need to decide whether an employment practice, typically an exam or educational requirement, has disparate impact on applicants from a protected group. Government agencies often use a guideline, called the four-fifths rule, i.e. if the pass rate of members of a protected group is less than four-fifths or 80% of the pass rate of majority applicants, the practice in question is deemed to have a disparate impact. Then the employer needs to demonstrate that the exam or requirement is job-related. Not everyone who passes the exam is promoted or hired, especially in situations where the promotions or hires are made according to the ranks of the exam scores and the list of those eligible expires after two or three few years. Hence, in addition to comparing the pass rates, courts have also compared promotion rates of those appointed the first time the list is used, ICSA Applied Statistics Symposium 2011, NYC, June 26-29

the ranks of the exam scores or the average scores of the applicants from the protected and majority groups. When the courts examine the scores with several statistical tests, the problem of multiple comparisons arises. Now the probability of “at least one of the statistical test finding a significant difference”ˇt exceeds the nominal level. We propose two tests that are linear combinations of the tests courts have used. Preliminary results indicate that the new tests keep the size at their nominal level, and have higher power compared to the commonly used tests.

Session 30: Stochastic Root-Finding and Optimization A Resampling-Based Stochastic Approximation Approach for Analysis of Large Geostatistical Data � Faming Liang, Yichen Cheng and Qifan Song Texas A & M University [email protected] The Gaussian geostatistical model has been widely used for modeling spatial data. However, it is challenging to computationally implement this method because it requires the inversion of a large covariance matrix, particularly, in a case with a large number of observations. In this paper, we propose a resampling-based stochastic approximation approach for tackling this difficulty. At each iteration of the new approach, a small subset of the observations is resampled from the full dataset, and then the current estimate of the parameters is updated within the framework of stochastic approximation. Since the new approach makes use of only a small proportion of the data at each iteration, it avoids inverting large covariance matrices and thus can be applied to very large data sets. Under mild conditions, it is shown that the parameter estimate resulting from the new approach converges in probability to a set of parameter values of equivalent Gaussian probability measures, and that the estimate is asymptotically normally distributed when the model or the reparameterized model is identifiable. To the best of the authors’ knowledge, the present study is the first one on asymptotic normality under infill asymptotics for general covariance functions. The new approach is illustrated with simulated and real large datasets. A Simple Baysian Approach to Multiple Change-Points � Haipeng Xing1 and Tze Leung Lai2 1 State University of New York at Stony Brook 2 Stanford University [email protected] After a brief review of previous frequentist and Bayesian approaches to multiple change-points, we describe a Bayesian model for multiple parameter changes in a multiparameter exponential family. This model has attractive sta- tistical and computational properties and yields explicit recursive formulas for the Bayes estimates of the piecewise constant parameters. Efficient estimators of the hyperparameters of the Bayesian model for the parameter jumps can be used in conjunction, yielding empirical Bayes estimates. The empirical Bayes approach is also applied to solve long-standing frequentist problems such as significance testing of the null hypothesis of no change-points versus multiple change-point alternatives, and inference on the number and locations of change-points that partition the unknown parameter sequence into segments of equal values. Simulation studies of performance and an illustrative application to the British coal mine data are also given. Extensions from the exponential family to general parametric families and from independent observations to genearlized linear time series models are then provided.

61

Abstracts A Coupling Spline Method for Stochastic Root-Finding � Kwok-Wah Ho1 and Inchi Hu2 1 The Chinese University of Hong Kong 2 The Hong Kong University of Science and Technology [email protected] We propose an application of smoothing spline methods in stochastic root-finding. Stochastic root-finding problem is essential in many optimization applications. For instance, it is often involved in approximating Maximum Likelihood Estimates for statistical models with intractable likelihood functions. A classical method to handle this problem is to employ stochastic approximation algorithms. In this work, we construct an alternative algorithm based on a spline model specifically designed for the root-finding purpose. Simulation results indicate that the algorithm can converge fast in terms of the number of iterations. This suggests that the new method is valuable when the cost of obtaining a response in each iteration is high. We apply the algorithm in approximating maximum likelihood estimates of spatial models and generalized linear mixed models and obtained favorable experimental results.

Session 31: Functional Data Analysis Quantitative Trait Loci Mapping with Differential Equation Models � Jiguo Cao1 and Rongling Wu2 1 Simon Fraser University 2 The Pennsylvania State University [email protected] Genetic mapping, attributing a phenotypic trait to its underlying genes, known as quantitative trait loci (QTLs), has been proven powerful for constructing the genotype-phenotype relationship. However, the traditional methods often neglect the biological principles underlying the dynamic interactions among different components in a complex biological system. In order to take into account these biological principles, we develop a conceptual model, called systems mapping. A group of ordinary differential equations (ODE) are proposed to quantity how alterations of different components lead to the global change of the biological system under the regulations of specific QTLs. The ODE parameters are estimated from the noisy time-course trait data in a framework of functional mixture models. Through testing genotype-specific differences in ODE parameters, system mapping can identify the genetic effects of QTLs on component-component interactions. System mapping should enable geneticists to shed light on the genetic complexity of the biological system and predict its physiological and pathological states. Integrating Data Transformation in Principal Components Analysis � Mehdi Maadooliat1 , Jianhua Huang1 and Jianhua Hu2 1 Texas A&M University 2 The University of Texas MD Anderson Cancer Center [email protected] Storage and analysis of high-dimensional datasets are always challenging. Dimension reduction techniques are usually used to reduce the complexity of the data and obtain the most informative parts of the datasets. PCA is one of the commonly used dimension reduction techniques. However, PCA does not work well when there are outliers or the data distribution is skewed. One popular solution is to transform the data to resolve the abnormal behavior caused from skewness or presence of outliers. Usually, such transformations can be obtained based on extensive data exploration, previous studies,

62

or prior knowledge of expertise. In this work, we present an automatic procedure to achieve this goal based on a statistical model with extensions for handling the missing data and functional data structure. The proposed technique transforms the data to vanish the skewness of the data distribution and simultaneously perform the standard PCA to reduce the dimensionality. Our method is cast into a profile likelihood framework for efficient computation. Local Tests for Identifying Anisotropic Diffusion Areas in Human Brain on DTI Tao Yu1 and � Chunming Zhang2 1 National University of Singapore 2 University of Wisconsin-Madison [email protected] Diffusion Tensor Imaging (DTI) plays a key role in nalyzing the physical structures of biological tissues, particularly in reconstructing fiber tracts of the human brain in vivo. Derived eigenvalues of diffusion tensors (DTs), estimated from noisy diffusion weighted imaging (DWI) data, however, usually contain systematic bias, which subsequently bias the diffusivity measurements, such as FA and RA, used in fiber tracking algorithms. Furthermore, since the DTI data are typically spatially structured, ignoring the neighborhood information may diminish the effectiveness of fiber tracking algorithms. This paper aims to establish a test-based approach to identify anisotropic water diffusion areas in the human brain, which, in turn, indicate the areas of fiber tracts. Our proposed test statistic not only takes into account the bias components of estimated eigenvalues, but also incorporates the spatial information of neighboring voxels. Under mild regularity conditions, we demonstrate that the proposed test statistic asymptotically follows a chi2 distribution under the null hypothesis. Simulation studies and real DTI data demonstrate the efficacy of our proposed approach.

Session 32: Recent Developments and Future Prospective in Statistical Methods in Longitudinal Data Estimating Treatment Effects for Episodic Interventions Based on Response-Dependent Observation � Richard Cook, Meaghan Cuerden and Cecilia Cotton University of Waterloo [email protected] We consider the problem of comparing the effects of two therapeutic interventions administered in response to a recurring episodic condition. The data arising in such settings appears like ordinary longitudinal data and it is common for analyses to be based on related methods. When there is a relation between the response to the treatment and the number of episodes requiring treatment, this ignores the dependence between the observation and response process and can lead to biased estimates. We consider some particular models to illustrate the biases that result from naive analyses and use data from a recent transfusion trial for illustration. Inverse probability weighted estimating equations offer one approach to addressing this bias. Efficient Semiparametric Regression for Longitudinal Data with Nonparametric Covariance Estimation Yehua Li University of Georgia [email protected] For longitudinal data, when the within-subject covariance is misspecified, the semiparametric regression estimator could lose efficiency. We propose a method that combines the efficient semiparaICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts metric estimator with nonparametric covariance estimation. The proposed method is robust against mis-specification of covariance models. We show that kernel covariance estimation provides uniformly consistent estimators for the within-subject covariance matrices, and the semiparametric profile estimator with substituted nonparametric covariance is still semiparametrically efficient. The finite sample performance of the proposed estimator is illustrated by simulation studies. In an application to CD4 count data from an AIDS clinical trial, we further extend the proposed method to a functional analysis of covariance model. A Moving Average Cholesky Factor Model in Covariance Modeling for Longitudinal Data Weiping Zhang1 and � Chenlei Leng2 1 University of Science and Technology of China 2 National University of Singapore [email protected] We propose new regression models for parameterizing covariance structures in longitudinal data analysis. Using a novel Cholesky factor, the entries in this decomposition have moving average and log innovation interpretation and are modeled as linear functions of covariates. We propose efficient maximum likelihood estimates for joint mean-covariance analysis based on this decomposition and derive the asymptotic distributions of the resulting coefficient estimates. Furthermore, we study a computationally more efficient local search algorithm than the traditional all subset selection, based on BIC for model selection, and show its model selection consistency. Thus, a key conjecture made by Pan and Mackenzie (2003) can be similarly verified. We demonstrate the finite-sample performance of the proposed method via analysis of the CD4 data and simulations.

Session 33: High-Dimensional Inference in Biostatistical Applications Ultra Dimension Reduction via Asymptotic Independent Correlation Coefficients Treasa Q. Cui and � Zhengjun Zhang University of Wisconsin-Madison [email protected] The sample based Pearson’s product-moment correlation coefficient and the quotient correlation coefficient are asymptotically independent, which is a very important property as it shows that these two correlation coefficients measure completely different dependencies between two random variables, and they can be very useful if they are simultaneously applied to data analysis. Motivated from this fact, Zhang, Qi, Ma (2011) introduce a new way of combining these two sample based correlation coefficients into maximal strength measures of variable association. Theoretical results and simulation examples show that the new combined measure is clearly superior in detecting dependence between random variables. This work applies the new combined measure and a new marginal distribution transformation method to dimension reduction in ultra-high dimensionspace. Simulation examples show that our methods can be very efficient in variable selections in nonlinear regression models. Biological data analysis will be illustrated. Dimension Reduction and Variable Selection for Censored Regression in Genomics Data � Wenbin Lu and Lexin Li North Carolina State University [email protected] ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Methodology of sufficient dimension reduction (SDR) has offered an effective means to facilitate regression analysis of highdimensional data. When the response is censored, however, most existing SDR estimators cannot be applied, or require some restrictive conditions. In this article, we propose a new class of inverse censoring probability weighted SDR estimators for censored regressions. Moreover, regularization is introduced to achieve simultaneous variable selection and dimension reduction. Empirical performance of the proposed method is examined via simulations and further illustrated with an application to the DLBCL Gene Expression Data. Generalized Thresholding Estimators for High-Dimensional Location Parameters � Min Zhang1 , Dabao Zhang1 and Martin T. Wells2 1 Purdue University 2 Cornell University [email protected] Estimating high-dimensional location parameters is usually involved in analyzing high-throughput biological data. Thresholding estimators can significantly improve such estimation when many parameters are zero. Several estimators have been constructed to be adaptive to parameter sparsity assuming the parameter spaces are symmetric. Since many applications present asymmetric parameter spaces, we introduce a class of generalized thresholding estimators. A construction of these estimators is developed using a Bayes approach, where an important constraint on the hyperparameters is identified. A generalized empirical Bayes implementation is presented for estimating high-dimensional yet sparse normal means. This implementation provides generalized threshoding estimators which are adaptive to both sparsity and asymmetry of highdimensional parameters. High-Dimensional Modeling of Data with Correlated Variables and Structural Constraints with Applications in Statistical Genomics Z. John Daye University of Pennsylvania School of Medicine [email protected] Two outstanding challenges in the analysis of complex and highdimensional data involve the characterization of correlated variables and the incorporation of structural constraints in statistical models. Correlated variables routinely appear in many biological applications, such as gene microarray analysis, where data often exhibit multicollinearity due to the presence of genes or variables that belong to the same system or pathway. Further, genomic covariates often have a priori structures which can be represented as vertices of an undirected graph. In this talk, I will present two interrelated methodologies to approach these important problems. In each case, theoretical studies and simulation comparisons are detailed to demonstrate the efficacy of the method; and real-data applications are provided to illustrate the applicability of the methods in statistical genomics. This talk includes works joint with X. Jessie Jeng, Hongzhe Li, and Jichun Xie.

Session 34: Statistical Machine Learning Functional Additive Regression � Yingying Fan and Gareth James University of Southern California [email protected]

63

Abstracts We suggest a new method, called “Functional Additive Regression“, or FAR, for efficiently performing high dimensional functional regression. FAR extends the usual linear regression model involving a functional predictor, X(t), and a scalar response, Y, in two key respects. First, FAR uses a penalized least squares optimization approach to efficiently deal with high dimensional problems involving a large number of different functional predictors. Second, FAR extends beyond the standard linear regression setting to fit general non-linear additive models. We demonstrate that FAR can be implemented with a wide range of penalty functions using a highly efficient coordinate descent algorithm. Theoretical results are developed which provide motivation for the FAR optimization criterion. Finally, we show through simulations and two real data sets that FAR can significantly outperform competing methods.

functions or on the underlying distributions of each subclasses. We propose a model-free procedure to estimate multiclass probabilities based on large-margin classifiers. The new estimation scheme is employed by solving a series of weighted large-margin classifiers and then systematically extracting the probability information from these multiple classification rules. A main advantage of the proposed probability estimation technique is that it does not impose any strong parametric assumption on the underlying distribution and can be applied for a wide range of large-margin classification methods. A general computational algorithm is developed for class probability estimation. Furthermore, we establish asymptotic consistency of the probability estimates. Both simulated and real data examples are presented to illustrate performance of the new procedure.

Forward-Lasso with Adaptive Shrinkage Peter Radchenko and � Gareth James University of Southern California [email protected] Recently, considerable interest has focused on variable selection methods in regression situations where the number of predictors, p, is large relative to the number of observations, n. Two commonly applied variable selection approaches are the Lasso, which computes highly shrunk regression coefficients, and Forward Selection, which uses no shrinkage. We propose a new approach, “ForwardLasso Adaptive SHrinkage“ (FLASH), which includes the Lasso and Forward Selection as special cases, and can be used in both the linear regression and the Generalized Linear Model domains. As with the Lasso and Forward Selection, FLASH iteratively adds one variable to the model in a hierarchical fashion but, unlike these methods, at each step adjusts the level of shrinkage so as to optimize the selection of the next variable. We first present FLASH in the linear regression setting and show that it can be fitted using a variant of the computationally efficient LARS algorithm. Then we demonstrate, through numerous simulations and real world data sets, as well as some theoretical analysis, that FLASH generally outperforms many competing approaches.

Session 35: Adaptive Designs Post-FDA Guidance: Challenges and Solutions

Pairwise Variable Selection for Classification � Xingye Qiao1 , Yufeng Liu2 and J. S. Marron2 1 State University of New York at Binghamton 2 University of North Carolina at Chapel Hill [email protected] We aim to find groups of important variables that influence the class assignments in the context of classification problems, through their bivariate joint effects. The goal is to identify those variables that may have weak marginal effects, but can lead to accurate classification predictions when they are viewed jointly. To accomplish this, we propose a permutation hypothesis test called Significance test of Joint Effect (SigJEff). The resulting object of SigJEff is a set of pairs of variables with statistically significant joint effects. Two approaches to summarize the significant pairs to a set of individual variables are proposed for the purpose of modeling and prediction. Multiclass Probability Estimation via Large Margin Classifiers Yichao Wu1 , � Hao Helen Zhang1 and Yufeng Liu2 1 North Carolina State University 2 University of North Carolina at Chapel Hill [email protected] Classical approaches for multiclass probability estimation are typically based on regression techniques such as multiple logistic regression, or density estimation approaches such as LDA and QDA. These methods often make certain assumptions on the probability

64

Response-Adaptive Dose-Finding under Model Uncertainty � Frank Bretz1 , Bjorn Bornkamp1 , Holger Dette2 and Jose Pinheiro3 1 Novartis Pharmaceuticals Corporation 2 Ruhr-University Bochum 3 Johnson & Johnson [email protected] In pharmaceutical drug development, dose-finding studies are of critical importance because both safety and clinically relevant efficacy have to be demonstrated for a specific dose of a new compound before market authorization. Motivated by a real dose-finding study, we propose response-adaptive designs addressing two major challenges in dose-finding studies: uncertainty about the dose-response models and large variability in parameter estimates. To allocate new cohorts of patients in an ongoing study, we use optimal designs that are robust under model uncertainty. In addition, we use a Bayesian shrinkage approach to stabilize the parameter estimates over the successive interim analyses used in the adaptations. This approach allows us to calculate updated parameter estimates and model probabilities that can then be used to calculate the optimal design for subsequent cohorts. The resulting designs are hence robust with respect to model misspecification and additionally can efficiently adapt to the information accrued in an ongoing study. We focus on adaptive designs for estimating the minimum effective dose, although alternative optimality criteria or mixtures thereof could be used, enabling the design to address multiple objectives. In an extensive simulation study, we investigate the operating characteristics of the proposed methods under a variety of scenarios. Bayesian Response-Adaptive Dose-Ranging Studies: Design and Implementation Challenges � Michael Krams and Jose Pinheiro Johnson & Johnson Pharmaceutical RD, L.L.C. [email protected] The pharmaceutical industry currently faces a pipeline problem, with escalating development costs and decreasing number of approvals. Among the factors contributing to this problem is poor dose selection for confirmatory trials, leading to high attrition rates (estimated at 50%) for Phase 3 programs. Improving the efficiency of dose-finding studies is thus of critical for the survival of the industry. Response-adaptive dose-ranging designs, often based on Bayesian methods, have experienced increasing interest as promising approaches to improve the knowledge efficiency of Phase 2 trials. But they also come with challenges of their own, at the design ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts and implementation levels. In this talk we’ll use the planning of a response-adaptive Bayesian dose-finding study in Alzheimer’s Disease to discuss the design and implementation challenges of this type of trial. Simulations will be used to illustrate how the key challenges can be addressed in practice. Merck Experience since the Food and Drug Administration Guidance on Adaptive Design � Keaven Anderson, Weili He, Jerald Schindler and Yang Song Merck Research Laboratories keaven [email protected] In 2010, many Merck protocols were approved after release of the FDA draft guidance on adaptive design. Over 40% of post-Phase I Merck protocols approved were adaptive in 2010. We will focus on the nature of the planned adaptations in relation to what is “wellunderstood“ and “less-well-understood“ from the guidance’ categorization scheme. An adaptive program for 2 experimental agents will be discussed in relation to the guidance and the need for accelerating development of medicines for patients with substantial unmet medical needs.

Session 36: Spatial Statistics in Bio-medical Applications Partial Likelihood Analysis of Spatio-Temporal Processes Peter Diggle Lancaster University [email protected] The likelihood function associated with a statistical model is key to principled statistical inference, whether Bayesian or classical, but for most of the statistical models used to analyse spatio-temporal phenomena the likelihood function is analytically intractable. For this reason, spatio-temporal model-fitting often uses either ad hoc estimation methods or computationally intensive Monte Carlo methods that require careful tuning to each application. Partial likelihood was proposed by Sir David Cox in 1972, in a famous paper that introduced the now very widely used class of proportional hazards models for survival data. In this talk, I will first make some general comments on the role of statistical modelling, contrasting empirical and mechanistic modelling strategies. I will then describe an adaptation of the Cox partial likelihood method to spatio-temporal modelling and show how it can be more tractable than the full likelihood as a basis for inference. Finally, I will describe three applications: the spread of foot-andmouth disease in Cumbria during the 2001 epidemic; the nesting pattern of a colony of Arctic terns; the early phase of the 2009 swine ’flu epidemic in the West Midlands. On Low-Rank Dynamic Space-Time Models for Large Georeferenced Data � Sudipto Banerjee1 , Andrew O. Finley2 , Rajarshi Guhaniyogi1 and Qian Ren1 1 University of Minnesota 2 Michigan State University [email protected] We discuss Bayesian hierarchical models for analysing relationships in large spatially referenced datasets. We show how dimension reducing stochastic processes can be embedded within hierarchical models to achieve our data analytic goals. In particular, we focus upon improving the applicability of a previously proposed class of dynamic space-time models by allowing them to accommodate large data sets. We focus on the common setting where space is ICSA Applied Statistics Symposium 2011, NYC, June 26-29

viewed as continuous but time is taken to be discrete. Scalability is achieved by using a low-rank predictive process to reduce the dimensionality of the data and ease the computational burden of estimating the spatial-temporal process of interest. The proposed models are illustrated using health, pollutant and climate data collected over the north-eastern United States between 2000-2005. Here, our interest is to use readily available covariates, association among measurements at a given station, as well as dependence across space and time to improve prediction for health outcomes such as asthma hospitalizations in the domain. Correlated Prior Models for Hidden Grouping in Small Area AFT Survival � Andrew Lawson1 and Jiajia Zhang2 1 Medical University of South Carolina 2 University of Southern California [email protected] A flexible approach to contextual modeling of geo-referenced survival is considered. We examine an accelerated failure time model for vital outcome following prostate diagnosis. Our models examine contextual spatial effects and hidden grouping of covariate parameters. We examine different spatial correlation prior distributions for the group labeling of spatial units, including both Ising/Potts and threshold CAR prescriptions. Our focus is SEER cancer registry data for prostate cancer. We apply our models to small area cancer data taken from the SEER Louisiana registry. Individual cancer diagnoses are the main inclusion criteria and we also consider geographically adaptive clustering of prostate risk within this state. Burgers and Fried Chicken: Characterizing the Spatial Distribution of Fast Food Restaurants in New York City � Ji Meng Loh1 and Naa Oyo Kwate2 1 AT&T Labs Research 2 Rutgers University [email protected] In recent years, there has been a marked increase in U.S. obesity rates, especially among children and the disadvantaged. Fast food has received attention as a factor in obesity both in adults and children and various research studies have suggested links between obesity and fast food consumption. Furthermore, there is evidence suggesting that food environments in residential neighborhoods and around schools can play an important role in terms of the diversity of dining options available. We will describe our ongoing efforts at understanding the spatial pattern of fast food restaurant locations in New York City, compared with that of all restaurant locations. We model the differences in terms of census demographic data and the number of nearby schools. We will describe a spatial anomaly detection method called K-scan and use it to identify geographic areas where there are more fast food restaurants than expected based on the underlying restaurant intensity.

Session 37: Nonparametric Inference and Secondary Analysis in Genomwide Studies Nonparametric Election Forensics Raul Jimenez Universidad Carlos III de Madrid [email protected] The best way to reconcile political actors in a controversial electoral process is a full audit. When this is not possible, statistical tools may be useful for measuring the likelihood of the results. The cost of errors in examining an allegation of fraud can be enormous.

65

Abstracts They can range from legitimizing an unfair election to supporting an unfounded accusation, with serious political implications. For this reason, we must be very selective about data, hypothesis and test statistics that will be used. We offer a critical review of recent statistical literature on Election Forensics, questioning most of the methodologies employed. In addition, we propose a nonparametric approach, based exclusively in vote counting, to address this important problem. Some elections are reexamined, offering new and intriguing aspects to previous analysis. Efficient Adaptively Weighted Analysis of Secondary Phenotypes in Case-Control Genome-wide Association Studies � Huilin Li1 and Mitchell H. Gail2 1 New York University 2 National Cancer Institute [email protected] We propose and compare methods of analysis for detecting associations between genotypes of a single nucleotide polymorphism (SNP) and a dichotomous secondary phenotype (X), when the data arise from a case-control study of a primary dichotomous phenotype (D), which is not rare. We assume that the genotype (G) is dichotomous, as in recessive or dominant models. To estimate the log odds ratio, , relating X to G in the general population, one needs to understand the conditional distribution [D|X, G], in the general population. For the most general model, [D|X, G], one needs external data on P (D = 1) to estimate . We show that for this “full model”, maximum likelihood (FM) corresponds to a previously proposed weighted logistic regression approach. Efficiency can be gained by assuming that [D|X, G] is a logistic model with no interaction between X and G (the “reduced model”). However, the resulting maximum likelihood (RM) can be misleading in the presence of interactions. We therefore propose an adaptively weighted approach (AW) that captures the efficiency of RM but is robust to rare SNPs with interactions. We study the robustness of FM, RM and AW to misspecification of P (D = 1). In principle, one should be able to estimate without external information on P(D=1) under the reduced model. However, our simulations show that resulting inference is unreliable. Therefore, in practice one needs to introduce external information on P (D = 1), even in the absence of interactions between X and G. Non-Parametric Estimation of Surface Integrals Raul Jimenez1 and � Joe Yukich2 1 Universidad Carlos III de Madrid 2 Lehigh University [email protected] Let X i, igeq1, be an i.i.d. uniform sample in [0, 1]d , dgeq2, and let Gsubset[0, 1]d have boundary partialG, which in general is unknown and not smooth. We review ways to use the sample X i to (i) reconstruct partialG and (ii) estimate the volume content of partialG with consistent estimators. More generally, if h : [0, 1]d oR is unknown, but the values h(X i), igeq1, are knowable, then we consider estimators of the surface integral int partialGhdx, and we establish their consistency and asymptotic normality. The talk extends joint work with R. Jimenez (Annals of Statistics, 2011, 39, 232-260). Non-Parametric Bayesian Techniques and Models for Community Identification � Jiqiang Guo, Alyson Wilson and Dan Nordman Iowa State University [email protected] We consider the problem of identifying communities from networks

66

using Bayesian clusters methods. Using a widely-used data set in evaluating community detection algorithms, we developed a series of statistical models to better model the reality as well as corresponding MCMC algorithms to sample from the posterior distribution. In these models, we take advantage of non-parametric Bayesian techniques, in which we do not need to specify number of communities a priori. By using statistical models to tackle community detection, we are in a better position to study some issues such as statistical significance. Additionally, decision theory based approaches are used to obtain point estimator of a set of communities for a network. Joint work with Drs Wilson and Nordman. Key words: community detection, non-parametric Bayesian, Chinese restaurant process, Dirichlet process, decision theory, network

Session 38: Manufacturing and Quality Assessment Development of Content Uniformity Test for Large Sample Sizes Using Counting Method � Meiyu Shen and Yi Tsong U.S. Food and Drug Administration [email protected] Dose content uniformity (DCU) testing is required by FDA to confirm that the dose content of the drug product consistent with the label claim. ICH guideline Q6A recommends a Pharmacopeia procedure for assessing the uniformity of dosage units. None of the current pharmacopeia procedures offers any guidance how to carry the test at the various sample sizes. PhRMA CMC Expert Team (DIJ, 2006) proposed a sampling acceptance procedure based on counting method for large sample size. The approach was evaluated to be biased. In 2011, FDA statisticians proposed a sample size dependent specification for testing dose content uniformity based on normal distribution assumption (Dong, 2011). However, it is clear that a normal large sample test may lead to the increase of false acceptance. In this project, we will revisit the counting method-based sampling acceptance procedure for large sample size. The approach will be based on tolerance interval concept and be developed with the sample size dependent specifications. A simulation study will be carried out to compare with normal distribution based tolerance interval approach and the counting method approach developed by PhRMA CMC expert team. Tolerance Interval Approaches and Hypothesis Testing for Pharmaceutical Quality Assessment Yi Tsong1 , Xiaoyu Dong2 , Meiyu Shen1 and � Jinglin Zhong1 1 U.S. Food and Drug Administration 2 University of Maryland Baltimore County [email protected] Expectation tolerance interval is One of the major quality assessments of pharmaceutical products is to determine whether there is sufficiently large proportion of the lot produced is within the specification limits or there is sufficiently small proportion of the product is beyond the pre-specified specification limits. Content expectation tolerance interval is often used for such objectives. In this article, we evaluate various content expectation tolerance intervals in coordinance with the hypotheses the quality testing is for. We further study the statistical properties of the testing procedures. Development of Sample Size Dependent Dose Content Uniformity Specification � Xiaoyu Dong1 , Yi Tsong2 , Meiyu Shen2 and Jinglin Zhong2 1 University of Maryland Baltimore County ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts 2

U.S. Food and Drug Administration, CDER/OB/Division of Biometrics VI [email protected] One of the major quality assessments of pharmaceutical products is to determine the dose content uniformity of tablets in a manufactured lot. The United States Pharmacopeia sampling acceptance test is the industrial standard to determine whether a lot satisfies the dose content uniformity when it is randomly assessed during its shelf life. Typically, it is a two-stage procedure consists of 10 tablets in the first stage. When the lot is not accepted in the first stage, an additional twenty tablets will be accessed to determine if the lot satisfies the specification. The current USP procedure is a tolerance interval based procedure that determines if the lot at the sampling time point, has sufficiently large proportion within the specification limits, or there is sufficiently small proportion of the product is beyond the pre-specified specification limits. Because of improvement in quality testing facilities and regulatory requirements of quality by design, acceptance testing with larger sample sizes is encouraged by the regulatory authority. However, the specification associated with a larger sample is not yet clearly defined for this objective. In this article, we evaluate various content expectation tolerance intervals approaches in co-ordinance with the hypotheses the quality testing is for. Based on the study results, we derived the relationship among sample size, power, and quality specification. With the derived power function, we proposed the generalization of the current sampling acceptance procedure used by US compendia to a lot release testing with large sample sizes while maintaining power to protect the manufacture risk.

Session 39: Next Generation Pharmacovigilance: Methodological and Policy Challenges Approaches to Improving Safety Monitoring of Biologic Products at Food and Drug Administration/CBER Robert Ball U.S. Food and Drug Administration, CBER/OBE [email protected] Monitoring safety of biologic products in actual use depends on a variety of data sources including spontaneous reporting systems, health care system administrative databases, and medical records. Integration of this information with other data sources, such as clinical trial results, plays an important role in the identification and validation of safety signals. Current methods often rely on a combination of formal epidemiological and statistical methods along with expert judgment. A major outstanding challenge is using these multi-dimensional data to facilitate making valid inferences more efficiently, consistently, and rigorously than current methods allow. Research at the FDA, CBER, Office of Biostatistics and Epidemiology focuses on a pattern recognition approach that includes automated feature extraction with text mining, near real time surveillance methods, and other novel approaches to improve the causal inference framework applied to safety monitoring of biologic products. Use of Geographic Variation in Comparative Effectiveness and Pharmacovigilance Studies � Mary Beth Landrum1 , Frank Yoon1 , Elizabeth Lamont1 , Ellen Meara2 , Amitabh Chandra3 and Nancy Keating1 1 Harvard Medical School 2 Dartmouth Medical School 3 Harvard Kennedy School ICSA Applied Statistics Symposium 2011, NYC, June 26-29

[email protected] In this era of health reform in the United States there is increasing call for better information on the benefits and risks of medical treatments in real world settings. These efforts must in part rely on analysis of observational data. Estimating benefits and risks of treatments in non-randomized settings requires careful data collection and analysis because of possible unobserved differences between patient groups. Increasingly, researchers are exploiting geographic variation in use of services to infer their effectiveness in individual patients, under the assumption that patient characteristics are balanced across geographic areas. However, little is understood about settings in which ecological variation can be used to obtain valid estimates of treatment effects in non-randomized studies. In this talk we describe the use of variation in receipt of treatments within and across geographic areas to estimate the benefits and risks treatments. We describe assumptions underlying alternative approaches and discuss both model based and propensity score approach to balance observed characteristics within and across areas. We illustrate our approach using data from population-based cohorts of elderly patients diagnosed with lung and prostate cancer in the Surveillance Epidemiology, and End Results (SEER)-Medicare database. A Statistical Algorithm for Adverse Event Identification Marianthi Markatou IBM Thomas J. Watson Research Center and Cornell University [email protected] Adverse events are associated with almost all medical products, including medications, vaccines and devices; they are a serious problem world-wide, leading to increased hospital care, injuries and death. Thus, continue monitoring is essential for patient safety and cost savings. We will describe a statistical algorithm that can be used to strengthen and generate hypotheses for adverse event identification. The method has a strong theoretical basis, and statistically significant results are simultaneously scientifically significant. Examples will be provided discharge summaries and data from electronic health records.

Session 40: Statistics in Drug Discovery and Early Development Strategies and Tools for Hit Selection in High Throughput Screening Andy Liaw Merck & Co., Inc. andy [email protected] High-throughput screening technologies have been an important tool for drug discovery since late 1990s. It involves utilizing highly automated robotic systems to perform miniaturized biochemical assays, screening a large collection of chemical compounds against biological targets of interest. The system is subject to artifacts that can cause bias in the assay readout, which in turn lead to higher false positive and negative rates. Over the past decade, several papers have been published on the topic of hit selection. Unfortunately some contain misleading statistical ideas. We compare several commonly used methods for hit selection, and provide a statistically rigorous discussion on them. We also advocate a strategy that uses multiple hit scoring methods. With regard to selection threshold, we favor the “top X“ approach over the “3-sigma“ approach, and we will demonstrate the reasons. An interactive graphical tool that was developed for use by assay scientists that implements the recommended strategy will be demonstrated.

67

Abstracts Robust Small Sample Inference for Fixed Effects in General Gaussian Linear Models � Chunpeng Fan1 , Donghui Zhang1 and Cun-Hui Zhang2 1 Sanofi-Aventis U.S. Inc. 2 Rutgers University [email protected] Although asymptotically, the empirical covariance estimator is consistent and robust with respect to the selection of the working correlation matrix, when the sample size is small, its bias may not be negligible. This paper proposes a small sample correction for the empirical covariance estimator in general Gaussian linear models. Inference for the fixed effects based on the corrected covariance matrix is also derived. A two-way ANOVA model with repeated measures which evaluates the effectiveness of a CB1 receptor antagonist and a four-period crossover design which assesses the treatment effect in subjects with intermittent claudication serve as examples to illustrate the proposed and other investigated methods. Simulation studies show that the proposed method generally performs better than other bias-correction methods including Mancl and DeRouen (2001, Biometrics 57, 126-134), Kauermann and Carroll (2001, JASA 96, 1387-1398), and Fay and Graubard (2001, Biometrics 57, 11981206) in the investigated balanced designs. Validation of Cell Based Image Features as Predictive Safety Biomarkers for DILI � Donghui Zhang and Chunpeng Fan Sanofi-Aventis U.S. Inc. [email protected] Drug-induced liver injury (DILI) is a common cause of drug nonapprovals and withdrawals. Cell-based in vitro model using high content screening (HCS) can be a promising predictive tool for early detection of DILI. To validate this model, compounds including both safe and toxic drugs are tested on human HepG2 cells in a doseresponse fashion at different treatment durations. Multiple cell morphologic and mechanistic features are extracted as potential predictive biomarkers. In this talk, we present our work in 1) flexible dose response curve fitting that can handle variety of compounds; and 2) compound classification for similarity in comparisons with safe and toxic compounds; 3) compound safety profiling assessment. PopPK Modeling for Oncology Drug Vorinostat with D-optimal Design Guided Sparse Sampling � Xiaoli Hou1 , Comisar Wendy1 , Nancy Agrawal1 and Bo Jin2 1 Merck & Co., Inc. 2 Pfizer Inc. xiaoli [email protected] Vorinostat is an oncology drug approved in the U.S. in 2006 as a monotherapy for the treatment of cutaneous T-cell lymphoma. The pharmacokinetics of vorinostat are highly variable and data were somewhat limited (given that all clinical studies need to be conducted in patients rather than healthy subject), which complicated the development of population PK models. Additionally, a sparse pharmacokinetic sampling scheme that should be convenient for the patient population was needed for the Phase II/III studies. This sparse sampling scheme needed to enable us to successfully obtain posthoc parameter estimates for patients in Phase II/III studies. With all of these factors in mind, we were able to fit a first-pass model to the Phase I data, which characterized the PK parameters fairly well. Based on this model, we used a D-optimal Design to strategically identify an appropriate sparse sampling scheme for the Phase II/III studies, which was critical since the quality of population PK parameter estimates is a function of the experimental design, including the number of concentrations measured per subject,

68

the timing of each blood sample, and the number of subjects. The population PK model was successfully updated with the additional Phase II/III data as it became available, which also validates the sparse sampling scheme.

Session 41: Large Scale Data Analysis and Dimension Reduction Techniques in Regression Models A Note on Sliced Inverse Regression with Missing Predictors � Yuexiao Dong1 and Liping Zhu2 1 Temple University 2 Shanghai University of Finance and Economics [email protected] Sufficient dimension reduction is effective in high-dimensional data analysis as it mitigates the curse of dimensionality while retaining full regression information. Missing predictors are common in high-dimensional data, yet are only discussed occasionally in the sufficient dimension reduction context. In this paper, an inverse probability weighted sliced inverse regression is studied with predictors missing at random. We cast sliced inverse regression into the estimating equation framework to avoid inverting a large scale covariance matrix. This strategy is more efficient in handling large dimensionality and strong collinearity among the predictors than the spectral decomposition of classical sliced inverse regression. Numerical studies confirm the supremacy of our proposed procedure over existing methods. Functional Mapping with Robust t-Distribution � Cen Wu and Yuehua Cui Department of Statistics and Probability, Michigan State University [email protected] Functional mapping has been a powerful tool in mapping quantitative trait loci (QTL) underlying dynamic traits of agricultural or biomedical interest. In functional mapping, multivariate normality is often assumed for the underlying data distribution, partially due to the ease of parameter estimation. When QTLs are present, heavy tail is often observed due to the mixture distribution of the data at the QTL locus. Thus, the normality assumption is often violated, especially under small sample size. Departure from normality has negative effect on testing power and inference for QTL identification. In this work, we relaxed the normality assumption and proposed a robust multivariate t-distribution mapping framework for QTL identification in functional mapping. Simulation studies show increased mapping power with the t-distribution than that of a normal distribution. The utility of the method is demonstrated through a real data analysis. Sufficient Dimension Reduction Based on Hellinger Integral Xiangrong Yin1 , Frank Critchley2 and � Qin Wang3 1 University of Georgia 2 The Open University 3 Virginia Commonwealth University [email protected] Sufficient dimension reduction provides a useful tool to study the dependence between a response Y and a multidimensional regressor X. A new formulation is proposed here based on the Hellinger integral of order two – and so jointly local in (X, Y ) – together with an efficient estimation algorithm. The link between chi2 divergence and dimension reduction subspaces is the key to our approach, which has a number of strengths. It requires minimal (essentially, just existence) assumptions. Multidimensional (discrete, continuous or mixed) Y as well as X are allowed. It unifies three ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts existing methods, each being shown to be equivalent to adopting suitably weighted forms of the Hellinger integral. b-bit Minwise Hashing for Ultra-Large Scale Data Analysis � Ping Li1 and Christian Konig2 1 Cornell University 2 Microsoft Research [email protected] b-bit minwise hashing is a recent breakthrough for computing set similarities, a fundamental problem in numerous applications including statistical learning, data mining, databases, information retrieval. For example, the entire Web may be represented as an enormous (binary) data matrix of 101 1 pages and 26 4 dimensions; and many similarity-based applications such as duplicate detection and classifications/clustering are conceptually operated on this matrix. b-bit minwise hashing is a substantial improvement over the standard minwise hashing algorithm, which is a popular randomized algorithm widely used in industry. b-bit hashing only stores the lowest b bits (instead of 64 bits) of each hashed value. It can be shown that, even in the least favorable situation, using 1-bit hashing can reduce the required storage by a significant factor of 21.3 when the desired similarity is at least 50%. Hessian Inverse Transformation for Dimension Reduction Heng-Hui Lue Tunghai University [email protected] Many model-free dimension reduction methods have been developed for nonlinear high-dimensional regression problems. In this paper, we propose a response transformation based Hessian directions method to reduce the dimension of predictors without requiring a prespecified parametric model. The benefit of using geometrical information from our method is highlighted. This method is extended for its ability to explore nonlinear regression data with nonlinear confounding in predictors. The weighted chi-squared test of dimension for our method is derived. Several examples are reported for illustration and comparisons are made with sliced inverse regression of Li (1997).

Session 42: New Approaches for Design and Estimation Issues in Clinical Trials Mixed Effect Models in Cross-over Studies Fei Wang Boehringer Ingelheim Pharmaceuticals fei.wang@Boehringer Ingelheim.com A recent observation by Kenward and Roger (2010) pointed out some potential bias when adjusting baseline covariates in a crossover study. Proposals were given by using fixed subject effects and a mixed effect model requiring separate with-period and betweenperiod regression coeficients. However, this striggers some doubts on the efficiency of mixed effect model in a cross-over study. We use an example from Senn (1999) to illustrate the use of mixed effect model in a stepwise fashion. Model selection criterion will be discussed. Senn S. 1999 Cross-over trials in clinical research. John Wiley & Sons Ltd. Kenward M.G. and Roger J.H. 2010 The use of baseline covariates in cross-over studies. Biostat (2010) 11 (1): 1-17. A Comparison of Methods for Pretest-Posttest Trials with Small Samples � Xiao Sun and Devan Mehrotra Merck & Co., Inc. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

xiao [email protected] The pretest-posttest setting is commonplace in clinical trials. One typical trial objective is to detect difference in the response between two treatment groups. The response variable is measured at or before randomization (pretest) and at pre-specified follow-up times (posttest). In search of the most efficient way to analyze this type of data, we compared various common analysis methods and concluded that the ANCOVA method is generally best with valid assumptions. However, due to small sample sizes and outlying observations, the normality assumption is often suspected. As such, we further looked into semiparametric (GEE), robust parametrics (robust regression) and non-parametric (rank based) methods. Simulations were used to compare the different methods under various conditions, and practical recommendations for statisticians will be highlighted. Multiplicity Issues in Clinical Trials with Co-primary Endpoints, Secondary Endpoints and Multiple Dose Comparisons Haiyan Xu Johnson & Johnson Pharmaceutical RD, L.L.C. [email protected] In clinical trials there are situations when the overall type I error rate needs to be controlled across co-primary endpoints and secondary endpoints. For example, a health authority may require improvement in both pain and functioning as compared to placebo for the treatment of pain due to osteoarthritis. The sponsor may also be interested in claiming efficacy in additional secondary endpoints in such a trial. The multiplicity issue can be further complicated by multiple dose comparisons. This paper proposes several methods that control the overall type I error for these situations. These methods are constructed using the closed testing principle and the partitioning principle. This paper also compares these methods with the IUT (intersection-union test) based method that is commonly used in testing co-primary endpoints. Use of the Average Baseline versus the Time-Matched Baseline in Parallel Group thorough QT/QTc Studies � Zhaoling Meng1 , Li Fan1 , Hui Quan1 , Robert Kringle1 and Gordon Sun2 1 Sanofi-Aventis U.S. Inc. 2 Celgene Corporation [email protected] In TQT studies, we compared the impact of the average baseline or time-matched baseline on the diurnal effect correction, treatment effect estimation and ANOVA /ANCOVA analysis model efficiency. We formulated and derived conditions for achieving better efficiency in the diurnal effect correction and treatment comparison. We demonstrate better efficiency using the average baseline compared to the time-matched baseline under our through QT/QTc study conditions. When there is observed baseline imbalance under randomization, the time-matched baseline ANCOVA models usually resulted in bias larger than that of ANOVA models and the average baseline ANCOVA model. Real data and simulations were used to demonstrate our findings. Sample Size and Power Calculation for Poisson Data in Clinical Trials with Flexible Treatment Duration � Lin Wang and Lynn Wei Sanofi-Aventis U.S. Inc. [email protected] Poisson regression is routinely used to model response rates or clinical count data as a function of covariate levels. The current work presents a method to calculate sample size and power for Poisson

69

Abstracts data in clinical trials with flexible treatment duration. The method also takes into consideration of recruitment rate and recruitment rate.

Session 43: Recent Development in Multivariate Survival Data Analysis Analysis of Recurrent Episodes Data: the Length-Frequency Tradeoff Jason Fine University of North Carolina at Chapel Hill [email protected] I consider a special type of recurrent event data, “recurrent episode data“ in which when a event occurs it last for a random length of time. Recurrent episode data arise frequently in studies of episodic illness. A naive recurrent event analysis disregards the length of each episode, which may contain important information about the severity of the disease, as well as the associated medical cost and quality of life. Analysis of recurrent episode data can be further complicated if the effects of treatment and other progrnostic factors are not constant over the observation period, as occurs when the covariate effects vary across episodes. I will review existing methods applied to recurrent episode data and approach the lengthfrequency tradeoff using recently developed temporal process regression. Novel endpoints are constructed which summary both episode length and frequency. Time varying coefficient models are proposed, which capture time varying covariate effects. New and existing methods are compared on data from a clinical trial to assess the efficacy of a treatment for cystic fibrosis patients experiencing multiple pulmonary exacerbations. Projecting Population Risk with Time-to-Event Outcome � Dandan Liu, Li Hsu and Yingye Zheng Fred Hutchinson Cancer Research Center [email protected] Accurate and individualized risk prediction is fundamental for successful prevention and intervention of many chronic diseases such as cancer and cardiovascular diseases. Cohort data provide useful resources to estimate the population risk, however, frequently studies used for deriving prediction models may come from heterogenous populations with baseline risks differing from that of the general population. For example, a healthy cohort effect has often been observed in large scale cohort studies. In this situation, while the relative risk may be generalizable to the population from which the cohort is sampled, the absolute risk can be biased. We propose an approach that borrows the external disease incidence information to estimate the absolute risk, through the time-dependent attributable risk function. Both analytical and simulation results show that the proposed approach has a smaller bias than the cohort-based absolute risk estimators, and the estimator is more efficient. An application of the proposed method to a real data set is also provided. Methods of Analyzing Bivariate Survival Data with Interval Sampling from Population Based Cancer Registry � Hong Zhu1 and Mei-Cheng Wang2 1 The Ohio State University 2 The Johns Hopkins University [email protected] Methods of analyzing bivariate survival data of age at cancer diagnosis and survival time after cancer, which allow one to make inference on cancer progression by using population based cancer registry data are presented. In collection of cancer registry data, it

70

is common to identify incidence of cancer within a calendar time interval, and subsequently bivariate or multivariate survival data are entered as part of patient’s information. We consider a sampling scheme where the bivariate data are collected conditioning on cancer diagnosis occurring within a calendar time interval. The sampling scheme is identified as the “interval sampling”. In collection of data, birth date of patient is retrospectively confirmed and death is observed subject to right censoring. It is important to properly account for the bias arising from interval sampling of cancer diagnosis in analyzing this type of data. Under mild model assumption, we propose a semiparametric weighted empirical estimator for estimating a bivariate survival function. A copula model is developed to incorporate information from censored observations and study the dependency of the bivariate data. The proposed estimation methods are evaluated by simulations with practical sample sizes. A comprehensive analysis of SEER (Surveillance, Epidemiology and End Results) ovarian cancer registry data is provided by using the proposed methods and compared with standard survival analysis methods, which demonstrates the performance of bias-corrected methods in estimation and analysis. Robust Working Models in Survival Analysis of Randomized Trials Jane Paik Stanford University [email protected] In the context of randomized trials, Rosenblum and van der Laan (2009, Biometrics, 63, 937 - 945) considered the null hypothesis of no treatment effect on the mean outcome within strata of baseline variables. They showed that hypothesis tests based on linear regression models and generalized linear regression models are guaranteed to have asymptotically correct Type I error regardless of the actual data generating distribution, assuming the treatment assignment is independent of covariates. We focus on another important outcome in randomized trials: the time from randomization until failure, and consider the null hypothesis of no treatment effect on the survivor function conditional on a set of baseline variables. By a direct application of arguments in Rosenblum and van der Laan (2009), we show that hypothesis tests based on multiplicative hazards models with an exponential link, i.e. proportional hazard models, and tests based on a non-exponential link function where the baseline hazard may be parametrized or left unspecified, are asymptotically valid under model misspecification, provided that the censoring distribution is independent of the treatment assignment given the covariates. We also discuss extensions of the robustness results when auxiliary models are needed for correct inference.

Session 44: Interface Between Nonparametric and Semiparametric Analysis and Genetic Epidemiology Statistical Methods for Rare Variant Association Testing for Sequencing Data Xihong Lin Harvard School of Public Health [email protected] Sequencing studies are increasingly being conducted to identify rare variants associated with complex traits. The limited power of classical single marker association analysis for rare variants poses a central challenge in such studies. We propose the sequence kernel association test (SKAT), a supervised, flexible, computationally efficient regression method to test for association between genetic ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts variants (common and rare) in a region and a continuous or dichotomous trait, while easily adjusting for covariates. As a score-based variance component test, SKAT can quickly calculate p-values analytically by fitting the null model containing only the covariates, and so can easily be applied to genome-wide data. Using SKAT to analyze a genome-wide sequencing study of 1000 individuals, by segmenting the whole genome into 30kb regions, requires only 7 hours on a laptop. Through analysis of simulated data across a wide range of practical scenarios and triglyceride data from the Dallas Heart Study, we show that SKAT can substantially outperform several alternative rare-variant association tests. We also provide analytic power and sample size calculations to help design candidate gene, whole exome, and whole genome sequence association studies. A Shared-Association Model for Genetic Association Studies with Outcome Stratified Samples � Colin O. Wu, Gang Zheng and Minjung Kwak National Heart Lung and Blood Institute [email protected] Genetic association studies in practice often involve multiple traits resulted from a common disease mechanism, and samples for such studies are often stratified based on some of trait outcomes. In such situations, statistical methods using only one of these traits may be inadequate and lead to under-powered tests for detecting genetic associations. We propose in this paper an estimation and testing procedure for evaluating the shared-association of a genetic marker on the joint distribution of multiple traits of a common disease. Specifically, we assume that the disease mechanism involves both quantitative and qualitative traits, and our samples could be stratified based on one of the qualitative traits. Through a joint likelihood function, we derive a class of estimators and test statistics and their asymptotic distributions for testing the shared genetic association on both the quantitative and qualitative traits. Our simulation study shows that the joint likelihood test procedure is potentially more powerful than association tests based on univariate traits. Application of our proposed procedure is demonstrated through the rheumatoid arthritis data provided by the Genetic Analysis Workshop 16 (GAW16). Bayesian Quantitative Trait Loci Mapping for Gene-Gene and Gene-Environment Interactions Fei Zou University of North Carolina at Chapel Hill [email protected] Quantitative traits and complex diseases are affected by the joint action of multiple genes. Most of the available genetic mapping methods only map one or a few QTL simultaneously with up to two-way gene-gene interactions considered, and are therefore not efficient for mapping the key genes influencing such complex traits. The identification of these genes is a very large variable selection problem: for q potential genes, with q being in the hundreds or thousands, there are 2p possible main effect models. In this talk, we introduce a Bayesian variable selection approach for semiparametric genetic mapping. The approach allows us to select genetic variants that are not necessarily all individually important but rather together important.

Session 45: High Dimensional Statistics in Genomics Use of Orthogonal Statistics in Testing for Gene-Environment Interactions � James Dai, Charles Kooperberg, Michael LeBlanc and Ross PrenICSA Applied Statistics Symposium 2011, NYC, June 26-29

tice Fred Hutchinson Cancer Research Center [email protected] Various two-stage multiple testing procedures have been proposed to screen for gene-gene or gene-environment interactions in genome-wide association (GWA) studies, though their theoretical properties have not been elaborated. In this article, we discuss conditions that are required to achieve strong control of the FamilyWise Error Rate (FWER) by such procedures. We propose a unified estimating equation approach to prove the asymptotic independence between screening statistics, for example marginal association (MA), and interaction statistics, for example case-only estimators, in a broad range of sampling plans and regression models. In case-control studies nested within a randomized clinical trial, a complementary screening criterion, namely deviation from baseline independence (DBI) in the case-control sample, can be used for discovering significant interactions or main effects. Simulations and an application to a GWA study in Women’s Health Initiative (WHI) are presented to show utilities of the proposed two-stage testing procedures in GWAS and in pharmacogenetic studies. Multiple Testing of Local Maxima for Peak Detection in ChIPSeq Yulia Gavrilov1 , Clifford A. Meyer2 and � Armin Schwartzman3 1

Tel Aviv University

2

Dana-Farber Cancer Institute

3

Harvard School of Public Health [email protected] A multiple testing approach to peak detection is applied to the problem of detecting transcription factor binding sites in ChIP-Seq data. In the proposed algorithm, after kernel smoothing, the presence of a peak is tested at each observed local maximum, followed by multiple testing correction via false discovery rate. The adaptation to ChIP-Seq data includes modeling of the data as a Poisson sequence, use of Monte Carlo simulations to estimate the distribution of the heights of local maxima under the null hypothesis for computing p-values of candidate peaks, and local estimation of the background Poisson rate from a Control sample. eQTL Mapping Using RNA-seq Wei Sun University of North Carolina at Chapel Hill [email protected] Using RNA-seq, expression level of a gene can be measured by the total number of sequence reads mapped to this gene, which we refer to as Total Read Count (TReC). Traditional eQTL methods developed for microarray data, such as linear regression, can be applied to TReC measurements given they are properly normalized. In this talk, we show that eQTL mapping by directly modeling TReC using discrete distributions has higher power than the two-step approach: data normalization followed by linear regression. In addition, RNAseq provides the information of allele specific expression (ASE), which is not available from microarray. By combining the information from TReC and ASE, we can computationally distinguish cisand trans-eQTL and further improve the power of cis-eQTL mapping.

71

Abstracts Session 46: Biomarker Discovery and Individualized Medicine Selective Voting in Convex-Hull Ensembles Improves Classification Accuracy � Ralph L. Kodell, Chuanlei Zhang, Eric R. Siegel and Radhakrishnan Nagarajan University of Arkansas for Medical Sciences [email protected] Biotechnological innovations that have occurred over the past two decades have generated enthusiasm for using genomic and other high-dimensional data to improve the treatment of diseases by tailoring therapies on an individualized basis. Classification algorithms have been developed to predict risks and responses of patients based on such high-dimensional predictor variables. However, these algorithms have yet to demonstrate sufficient predictive ability to be adopted in routine clinical practice. In this presentation it is suggested that the limited success realized so far can be attributed to the fact that classification algorithms generally classify all patients according to the same set of criteria, under an implicit assumption of population homogeneity. Here a different approach is presented that allows for population heterogeneity which may not be readily apparent, and thus not controlled. A new selective-voting algorithm that allows for the existence of subpopulations that need not be identified in advance is described, in the context of a classification ensemble based on two-dimensional convex hulls of positive and negative training samples. Members of the ensemble are allowed to vote on unknown test samples only if they are located within or behind suitably reduced, or pruned, convex hulls of training samples. Validation of the new algorithm’s increased accuracy, sensitivity, specificity, positive predictive value and negative predictive value is carried out using publicly available data with cancer as the outcome variable and expression levels of thousands of genes as the predictors. Quantitative Analysis of Genome-wide Chromatin Remodeling � Songjoon Baek, Myong-Hee Sung and Gordon L. Hager National Cancer Institute, CCR, LRBGE [email protected] Recent high-throughput sequencing technologies have opened the door for genome-wide characterization of chromatin features at an unprecedented resolution. Chromatin accessibility is an important property that regulates protein binding and other nuclear processes. Here we describe statistical methods to analyze chromatin accessibility using DNaseI hypersensitivity by sequencing (DNaseI-seq). Although there are numerous bioinformatic tools to analyze ChIPseq data, our statistical algorithm was developed specifically to identify significantly accessible genomic regions by handling features of DNaseI hypersensitivity. Without prior knowledge of relevant protein factors, one can discover genome-wide chromatin remodeling events associated with specific conditions or differentiation stages from quantitative analysis of DNaseI hypersensitivity. By performing appropriate subsequent computational analyses on a select subset of remodeled sites, it is also possible to extract information about putative factors that may bind to specific DNA elements within DNaseI hypersensitive sites. These approaches enabled by DNaseI-seq represent a powerful new methodology that reveals mechanisms of transcriptional regulation. Comparative Genetic Pathway Analysis Xiao Wu State University of New York at Stony Brook [email protected]

72

Xiao Wu∗, Kathryn Sharpe∗, Ellen Li∗∗, Tianyi Zhang∗, Hongyan Chen∗, Safiyh Taghavi�, Daniel van der Lelie�, Wei Zhu∗ (Dr. Wei Zhu is the presenting author) ∗Department of Applied Mathematics and Statistics, **Department of Medicine, Stony Brook University Stony Brook, NY, 11794, USA; �Biology Department, Brookhaven National Laboratory, Upton, NY, 11793, USA In this work, we propose a novel genetic pathway discovery and comparison analysis framework integrating newly generated gene expression microarray data and existing biological pathway information. Starting with the significance analysis of microarray (SAM), a list of differentially expressed genes among groups is obtained. This gene list is then imported to the Ingenuity Pathway Analysis (IPA) to yield potentially relevant biological pathways. Finally, a newly-developed covariate structural equation modeling method is applied to evaluate gene-gene interactions and group difference. We illustrate this novel comparative pathway analysis pipeline using the whole human genome expression profiling data collected from a study of inflammatory bowel diseases (IBD) with 99 subjects from three phenotypic groups: ileal Crohn’s disease (CD), ulcerative colitis (UC) and control non-IBD.

Session 47: Applications in Spatial Statistics Modeling the Spread of Plant Disease Using a Sequence of Binary Random Fields with Absorbing States Mark S. Kaiser Iowa State University [email protected] The problem addressed is modeling the spread of bean pod mottle virus in soybeans over space and time. The transmission vector for this disease is a type of beetle. At any given time point, a binary Markov random field can be formulated. A difficulty with extending this structure to multiple time points is that once a location becomes infected it remains infected at all future times. Such absorbing states violate support conditions needed to formulate a Markov random field over both space and time in this problem. Using a combination of Markov conditioning in time and Markov random fields for variable lattice definitions over time allows the construction of a joint that has the specified conditional distributions. Simulation based model assessment is used to compare four models that represent the underlying process of disease spread in different ways. Autologistic Models for Binary Data on a Lattice � John Hughes1 , Murali Haran2 and Petrutza Caragea3 1 University of Minnesota 2 The Pennsylvania State University 3 Iowa State University [email protected] The autologistic model is a Markov random field model for spatial binary data. Because it can account for both statistical dependence among the data and for the effects of potential covariates, the autologistic model is particularly suitable for problems in many fields, including ecology, where binary responses, indicating the presence or absence of a certain plant or animal species, are observed over a two-dimensional lattice. We consider inference and computation for two models: the original autologistic model due to Besag, and the centered autologistic model proposed recently by Caragea and Kaiser. Parameter estimation and inference for these models is a notoriously difficult problem due to the complex form of the likelihood function. We study pseudolikelihood (PL), maximum likelihood (ML), and Bayesian approaches to inference and describe ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts ways to optimize the efficiency of these algorithms and the perfect sampling algorithms upon which they depend, taking advantage of parallel computing when possible. We conduct a simulation study to investigate the effects of spatial dependence and lattice size on parameter inference, and find that inference for regression parameters in the centered model is reliable only for reasonably large lattices (n > 900) and no more than moderate spatial dependence. When the lattice is large enough, and the dependence small enough, to permit reliable inference, the three approaches perform comparably, and so we recommend the PL approach for its easier implementation and much faster execution. Theory and Practice for Massive Spatial Data Hao Zhang Purdue University [email protected] Spatial statistics today deals with far more complex problems than decades ago. This has to do with the prevalence of spatial data in an increasing number of disciplines—technology has made the collection and achieving of spatial data feasible and less expensive. Spatial data can be huge in size, which presents a challenge to both likelihood-based and Bayesian inferences because a large covariance matrix is involved in the inferences. Consequently, approximate models and inferences have been proposed, some of which were motivated by the infill asymptotic results and some by computing techniques. I will reveal some of the methods and some recent work on multivariate spatial models. Non-Parametric Estimation of Spatial Covariance Function � Zhengyuan Zhu and Li Yang Iowa State University [email protected] In spatial statistics estimating covariance structure is of fundamental importance. In this talk we review existing methods for nonparametric estimation of spatial covariance function for both regularly spaced data and irregularly spaced data, and proposal new estimates which give improved performance in terms of both covariance estimation and spatial prediction. Both simulation studies and theoretical results will be presented.

Session 48: The Totality of Evidence in Safety and Efficacy Evaluation of Medical Products Collective Evidence Qian Li U.S. Food and Drug Administration qian.li@ Abstract: In this session, we explore statistical decision rules that are based on the totality of evidence from multiple clinical trials and multiple doses, endpoints, and tests. The discussion covers the philosophy of statistical thinking in drug evaluation with the totality of data, the logic that is driven by the philosophy and used in the error calculation. The strength of the proposed decision rules based on collective evidence will be presented with illustrations. Disclaimer: The views expressed in this session are the author’s own ,which do not necessarily represent the views of the FDA. Adaptive Statistical Methods for Control of Type I Error Rate for Both Multiple Primary and Secondary Endpoints. � Abdul J Sankoh and Haihong Li Vertex Pharmaceuticals Inc. abdul [email protected] ICSA Applied Statistics Symposium 2011, NYC, June 26-29

In randomized clinical trials, efficacy response variables are often classified as primary and secondary endpoints. The primary endpoints address directly the primary study objective. The focus of the study design and the primary statistical analysis method, including study power calculation, is on these endpoints. The sought indication and subsequent labeling claim is often limited to trial findings based on these primary endpoints. On the other hand, while the secondary endpoints serve a number of important roles toward a broader and more comprehensive understanding of the drug effect, the study design (and often the primary analysis method) is not linked directly to them and so findings based on these do not generally lead to labeling claim. This is specially so when the trial fails to demonstrate clinical and statistical significance with the protocolspecified primary endpoint(s). We discuss in this presentation criteria for the classification of response variables into primary and secondary endpoints, and recent advancements in statistical methods for strong control of the type I error rate for the primary and secondary testing of hypotheses that could allow both primary and secondary endpoint-specific labeling claims.

Session 49: Estimating Treatment Effects in Randomized Clinical Trials with Non-compliance and Missing Outcomes Analyzing Randomized Clinical Trials with Non-Compliance and Non-ignorable Missing Data � Yan Zhou1 , Roderick J.A. Little2 and John D. Kalbfleisch2 1 U.S. Food and Drug Administration, CDER 2 Department of Biostatistics, University of Michigan [email protected] We consider the analysis of clinical trials that involve randomization to an active treatment or a control treatment, while the active treatment is subject to all-or-none compliance. Besides missing compliance in the control treatment group, subsequent non-response outcome makes estimating treatment effect more complicated. Under the assumptions of latent ignorability, we consider four different restrictions: (1) compound exclusion restriction (ER) for outcome (Y ) and the missingness of Y (M Y ); (2) no compliance effect in controls (NCEC) for Y and M Y ; (3) ER for Y and NCEC for M Y ; (4) NCEC for Y and ER for M Y . ML estimates and method-ofmoments estimates are discussed under these restrictions. Sensitivity analyses for these four restrictions are also considered and methods are applied to a study examing the efficacy of clozapine versus haloperidol in the treatment of refractory schizophrenia. Handling Incomplete Data in Vaccine Clinical Trials � Ivan S.F. Chan, Xiaoming Li, William W.B. Wang and Frank G.H. Liu Merck Research Laboratories ivan [email protected] In clinical trials, incomplete data occurs almost inevitable due to various reasons. It is important to take the missing data into consideration and to prospectively account for it in the analyses. Vaccines are biological products that are primarily designed to prevent diseases, and are different from pharmaceutical products. While similarities exist between clinical trials for vaccines and pharmaceutical products, there are some unique issues in vaccine trials, including how to handle incomplete data. In this presentation we will discuss some statistical approaches for analyses of vaccine immunogenicity trials in the presence of incomplete data. In particular, we will discuss the constrained longitudinal data analysis method for adjusting

73

Abstracts baseline values with missing data, as well as methods for handling truncated and censored assay data. Real examples will be used to illustrate the proposed methods.

Session 50: Analysis of Biased Survival Data Semi-Parametric Modelling for Length-Biased Data � Yu Shen1 , Jing Ning2 and Jing Qin3 1 The University of Texas MD Anderson Cancer Center 2 The University of Texas School of Public Health 3 National Institutes of Health [email protected] Length-biased time-to-event data are commonly encountered in applications ranging from epidemiologic cohort studies or cancer prevention trials to studies of labor economy. A longstanding statistical problem is how to assess the association of risk factors with survival in the target population given the observed length-biased data.In this talk, we describe how to estimate these effects under the commonly used semiparametric models such Cox proportional hazards and AFT models. Nonparametric Estimation for Right-Truncation Data: a Pseudo-Partial Likelihood Approach Wei-Yann Tsai1 , Kam-Fai Wong2 and � Yi-Hsuan Tu3 1 Department of Biostatistics, Columbia University 2 Institute of Statistics, National University of Kaohsiung 3 Department of Statistics, National Cheng Kung University [email protected] We propose a pseudo-partial likelihood approach to estimate proportional hazards models with right-truncation data. Instead of consider residual lifetime, we mimic the marginal density of observed lifetime from Tsai (2009). This approach uses the full information of data and also retains the simplicity of conditional analysis. We will discuss the performance and properties of the proposed estimator by simulation studies. Kernel Density Estimation with Doubly Truncated Data ` Carla Moreira and � Jacobo de U˜na-Alvarez University of Vigo [email protected] In some applications with astronomical and survival data, doubly truncated data are sometimes encountered. In this work we introduce kernel-type density estimation for a random variable which is sampled under random double truncation. Two different estimators are considered. As usual, the estimators are defined as a convolution between a kernel function and an estimator of the cumulative distribution function, which may be the NPMLE (Efron and Petrosian, 1999) or a semiparametric estimator (Moreira and de U˜na` Alvarez, 2010). Asymptotic properties of the introduced estimators are explored. Their finite sample behaviour is investigated through simulations. Real data illustration is included. Various Inferences from the Forward and Backward Recurrence Times in a Prevalent Cohort Study with Follow-up � David Wolfson1 , Vittorio Addona2 , Masoud Asgharian1 and Juli Atherton1 1 McGill University 2 Macalster College [email protected] Various inferences from the forward and backward recurrence times in a prevalent cohort study with follow-up In a prevalent cohort study with follow-up subjects with prevalent disease are identified cross-sectionally and then followed forward in time until failure or

74

right censoring. We first assume that survival has remained constant prior to 11prevalence day.” By using only the observed, forward (possibly censored), and backward recurrence times we show that it is possible to test if the underlying incidence process (that is, the disease onset process) is a stationary Poisson process. Under such stationarity, we show how to estimate the incidence rate. Conversely, assuming stationarity, we show how to test for constancy in survival, again using the forward and backward recurrence times. We illustrate our results through simulations, and data collected on survival with dementia as part of the Canadian Study of Health and Aging.

Session 51: Suicide Research Methodology Methodological Issues in Suicide Research in Mainland China Michael R. Phillips Shanghai Mental Health Center, Shanghai Jiao Tong University [email protected] This presentation will discuss biostatistical issues that arise in three different types of suicide research in mainland China. 1) Estimating the number of suicides. China does not have a national mortality registry system so estimation of the suicide rates (and mortality rates from other causes) must be based on projecting suicide rates in the death registry system of the Ministry of Health (MoH)—which only covers 10% of the population—to the total population. But the death registry of the MoH is not representative of the total population and there are different estimates of total mortality so there is doubt about how best to adjust the available data from the MoH. 2) Psychological Autopsy data. Unlike studies of suicide decedents in other countries that have data from multiple sources, in China the only data about the circumstances prior to death comes from interviews with family members and close associates of the deceased. Assessing the consistency of reports across informants, identifying the factors that affect informants’ reports of previous psychological problems in the decedents, and deciding how to combine different reports are some of the methodological problems that need to be resolved when analyzing these data. 3) Dealing with cross-cultural issues. The experience and manifestation of psychosocial factors vary cross-culturally so it is often necessary to adapt western psychometric scales used in suicide research for use with Chinese populations. This can be a very lengthy process that results in substantial changes to the original scales. Comparing the psychometric properties of the revised scales to those of the original scales, and to those of the translated but unadjusted versions of the original scales can be quite complex, but it may highlight fundamental differences in the formulation of important determinants of suicide such as depressive affect and lack of social support. Predicting Short Term and Long Term Risk of Suicide Attempt after a Major Depressive Episode Hanga Galfalvy Department of Psychiatry, Columbia University [email protected] Summary: We compare prognostic models built to predict “early” and “late” suicide attempts after a Major Depressive Episode in mood disorder patients. Better predictive performance is expected for early attempts, since patients’ state measures change over time, but clinical subgroups may have different hazard functions. Method: Using data from a 2-year prospective study of 384 depressed patients, Cox proportional hazard regression models and survival trees were developed to predict future suicide attempts 1. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts during the first six months after baseline assessment, and 2. during the six months to two year timeframe. Statistical validity of the models was evaluated based on cross-validated predicted probabilities. Latent class analysis of seven baseline characteristics of suicidal behavior and co-morbid psychiatric diagnoses/traits was performed and yielded two high-risk classes of suicide attempters together with a low-risk attempter group. Kaplan-Meier estimates of the survival curves for the follow-up period were computed for the three latent classes of suicide attempters and for those without past attempt. Results: Models trained separately on early and late attempts outperformed the model trained on all attempts. Cox regression models outperformed survival trees in predicting both early and late attempts. Overall discrimination performance was comparable for Cox models built on early and late attempts. The survival curves for the two high-risk groups defined on baseline data were significantly different, one of them following a constant hazard and the other one was characterized by high initial risk that declined and was not significantly different from the low-risk group after the first 6 months. Conclusions: Using separate risk models for proximal and late suicide attempt can improve performance over predicting a common risk model but will still yield suboptimal results. A better estimate of the person-level proximal risk of suicide attempt can be obtained by separating the population into subgroups based on clinical subtypes. The Controversy about Antiepileptic Drugs and Suicide: Asking the Right Question and Addressing Confounding by Multiple Indications � Sue M. Marcus1 and Robert D. Gibbons2 1 Columbia University 2 University of Chicago [email protected] In January 2008, the FDA (on the basis of evidence from a metaanalysis of 199 placebo-controlled AED trials of 11 AEDs) issued a safety alert regarding the risk for increased suicidality among those taking antiepileptic medication, regardless of AED type and indication. However, recent studies following the alert drew conflicting conclusions (Gibbons et al, 2009; Patorno et al, 2010; Gibbons et al, 2010; Arana et al, 2010). We show how failure to ask the right question and failure to address the multiple indications for AEDs (eg bipolar disorder, epilepsy disorders, pain disorders, depression, alcohol abuse) can lead to misleading conclusions that may do more harm than good.

Session 52: Joint Modeling of Longitudinal and Time-toEvent Data in Medical Research Joint Analysis of Longitudinal Growth and Interval Censored Mortality Data � Darby Thompson1 , Charmaine Dean1 , Terry Lee2 and Leilei Zeng3 1 Simon Fraser University 2 St. Paul’s Hospital 3 University of Waterloo [email protected] Joint analysis of longitudinal and survival data has received considerable attention in the recent literature. This talk will review methods developed for such joint analysis and develop a joint model for the analysis of longitudinal data monitoring growth, and survival, subject to various interventions in a designed experiment. Of interest is the development of methods to handle features of the data ICSA Applied Statistics Symposium 2011, NYC, June 26-29

which are not common in considerations of joint analyses. A main feature is interval censoring of the survival response. We adopt linkages in random effects over multiple outcomes to develop a joint modelling framework, and handle interval censoring in the joint longitudinal-survival context using imputation methods. Properties of the EM algorithmic scheme for estimation are considered. We also discuss the conditions under which there are efficiency gains in joint analyses with regard determination of treatment effects. Robust Inference for Longitudinal Data Analysis with NonIgnorable and Non-Monotonic Missing Values Chi-Hong Tseng1 , � Robert Elashoff1 , Ning Li0 and Gang Li1 1 University of California, Los Angeles [email protected] A common problem in the longitudinal data analysis is the missing values due to subject’s missed visits and loss to follow up. Although many novel statistical approaches have been developed to handle such data structures in recent years, few methods are available to provide robust inference in the presence of outlying observations. In this paper we propose two methods, t-distribution model and robust normal model, for robust inference with non-ignorable nonmonotonic missing data problems in longitudinal studies. These methods are conceptually simple and computationally straight forward. We also conduct simulation studies and use a real data example to demonstrate the performance of these methods. An Exploration of Fixed and Random Effects Selection for Longitudinal Binary Outcomes in the Presence of Non-Ignorable Dropout � Ning Li1 , Michael Daniels2 , Gang Li3 and Robert Elashoff3 1 Cedars-Sinai Medical Center 2 University of Florida 3 University of California, Los Angeles [email protected] We explore a Bayesian approach to fixed and random effects selection for longitudinal binary outcomes that are subject to missing data caused by dropouts. We show via simulation that non-ignorable missing data lead to biased parameter estimates and thus result in selection of the wrong effects. By jointly modeling the longitudinal binary data with the dropout process, one is able to correct the bias in estimation and selection of fixed and random effects if the missing data are non-ignorable. We illustrate the approach using a clinical trial for acute ischemic stroke. One Scenario in Which Joint Modeling Is Unnecessary � Joel A Dubin1 and Xiaoqin Xiong2 1 University of Waterloo 2 Information Management Services, Inc. [email protected] There are many scenarios when analyzing a time-varying covariate and survival response to consider joint modeling. However, if one’s goal is to consider the association, possibly lagged, between a timevarying covariate and a recurrent event survival response, then an alternative method is to employ a binning-based smoothing step before applying a more traditional model for generalized longitudinal response data. In this talk, we discuss the binning method, why it was used over a joint model in a nephrology application of interest, and the useful interpretation of the results from this application. Time permitting, we will also discuss some simulation results, providing guidelines on when this binning approach will (and will not) be useful under more general settings.

75

Abstracts Session 53: Survey Research Method and Its Application in Public Health

misspecified. A general theory about OBP is developed. Estimation of the area specific MSPE of the OBP is considered. A real data example is discussed.

The Use of Multiple Imputation to Adjust for Biases in Estimating the Long-Term Trend of Lung Cancer Incidence by Histologic Type from Population Based Data � Mandi Yu, Eric J. (Rocky) Feuer, Kathleen Cronin and Neil Caporaso National Cancer Institute [email protected] Data from the Surveillance Epidemiology and End Results (SEER) Program are frequently used to provide national estimates of incidence rates of lung cancer by histologic types. SEER registries have been coding the form of cancer according to the International Classification of Diseases for Oncology (ICD-O). In ICD-O-3, a new morphologic term (8046: non-small cell carcinoma (NSCC)) is added to group carcinoma cases that cannot be classified beyond the exclusion of small cell. Because of this addition, the long-term trends for lung cancer histologic subtypes estimated without appropriate statistical adjustments could be biased. We selected 423,396 histology confirmed malignant lung cancer cases diagnosed during 1975-2007 from SEER 9 registries. Five in situ cases were excluded due to small sample size. We redistributed 54,527 cases, coded as 8046 or 8010 (carcinoma, not otherwise specified), which is often used for unspecified NSCC cases prior to ICD-O-3, into one of the specific NSCC subgroups: Squamous cell carcinoma (SCC), Adenocarcinoma, Large cell carcinoma (LCC), and Other specified nonsmall cell carcinoma (OSNSCC) using multiple imputation (MI) techniques. We developed the MI method based on sequential regressions (Raghunathan et al. 2001.) The imputations were conditional on registry, diagnosis year, age, sex, marital status, nativity, race, hispanic ethnicity, tumor size, staging, grade, survival, surgery, and county-level smoking prevalence estimates. Jointpoint analyses of trends were performed on the age-adjusted rates estimated from multiply imputed datasets. The rates for SCC, LCC, and OSNSCC still exhibit a deceasing trend, although less rapidly, after imputation. In contrast, the incidence trend for adenocarcinoma after 1992 changes from significantly decreasing (APC=-0.74) to a slightly increasing (APC=0.23), though nonsignificant tendency. Since 2001, the rates for specific NSCC subtypes were underestimated because a disproportionately large number of cases were coded as 8046 instead of being assigned to a specific histology. The impact is greater for adenocarcinoma because it has become the most common type of lung cancer in recent years. Our analysis shows that MI is a viable tool to correct for the bias due to coding inconsistency in incidence trend analysis.

Identifying Implausible Gestational Ages with Reversible Jump MCMC � Guangyu Zhang1 , Nathaniel Schenker1 , Jennifer D. Parker1 and Dan Liao2 1 National Center for Health Statistics 2 University of Maryland [email protected] Birth weight and gestational age are two important covariates used in obstetrics research and clinical practice. Although birth weights of newborns can be measured accurately, gestational ages for the US birth data are generally based on the mothers’recall of the last menstrual period. This process has been shown to introduce random or systematic errors and consequently, lower the data quality. In order to mitigate those errors, we extend two mixture models proposed by Tentoni et al (2004) and Platt et al (2001) using reversible Jump MCMC. We conduct simulation studies and apply our method to the 2002 US birth data from the National Vital Statistics System. Our research findings provide useful statistical tools to both evaluate the data quality and correct the misspecification problem.

Best Predictive Small Area Estimation Jiming Jiang1 , � Thuan Nguyen2 and J. Sunil Rao3 1 University of California, Davis 2 Oregon Health Sciences University 3 University of Miami [email protected] The observed best prediction (OBP) is a new prediction procedure for small area estimation. We show that best predictive estimator (BPE) is more reasonable than the traditional estimators derived from estimation considerations, such as maximum likelihood and restricted maximum likelihood, if the main interest is estimation of small area means. We use both theoretical derivations and empirical studies to demonstrate that the OBP can significantly outperform the empirical best linear unbiased prediction (EBLUP) in terms of the mean squared prediction error (MSPE), if the underlying model is

76

Efficient Analysis of Case-Control Studies with Sample Weights � Victoria Landsman and Barry Ira Graubard National Cancer Institute [email protected] Analysis of population-based case-control studies with complex sampling designs is challenging because the sample selection probabilities (and, therefore, the sample weights) depend on the response variable and covariates. Commonly, the design-consistent (weighted) estimators of the parameters of the population regression model are obtained by solving (sample) weighted estimating equations. Weighted estimators, however, are known to be inefficient when the weights are highly variable as is typical for casecontrol designs. In this paper we propose a new semi-parametric weighted estimator which incorporates modeling of the sample expectations of the weights in a design-based framework. The estimator has a theoretical basis and is robust to model misspecification. We also describe a sample pseudo maximum likelihood estimator for inference from case-control data, which generalizes the Breslow and Cain (1988) estimator and can be applied to the data from general sampling designs, including cluster samples. We discuss benefits and limitations of each of the two proposed estimators emphasizing efficiency and robustness. We compare the finite sample properties of the two new estimators and existing weighted estimators using simulations under various sample plans. The methods are applied to the U.S. Kidney Cancer Case-Control Study to identify risk factors.

Session 54: J P Hsu Memorial Session Combined Estimation of Treatment Effect Under a Discrete Random Effects Model � K. K. Gordon Lan, Jose Pinheiro Johnson & Johnson [email protected] Leveraging information from different experiments to improve knowledge about an underlying phenomenon is certainly appealing from a scientific point of view, but also carries with it a certain amount of risk that the different sources cannot be coherently ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts combined. We will give a brief review of the fixed effects model and the DerSimonian - Laird (1986) random effects model in meatanalysis, then propose a Level I random effects model with a discrete distribution. The two random effects approaches lead to different weights for the individual treatment effect estimates and different approaches for characterizing the precision of the resulting combined estimate. The proposed model and corresponding assumptions may be more plausible in certain practical situations and, in some ways, provide a compromise between the fixed and the DerSimonian - Laird approaches, avoiding their respective pitfalls. A New Approach to Margin Specification and Testing of Associated Hypotheses � George Y.H. Chi1 , Gang Li2 1 Johnson & Johnson Pharmaceutical R&D, L.L.C. 2 Johnson & Johnson LifeScan This presentation will briefly review the history of active control trials at FDA, discuss the fundamental problems of such trials, and the issue of margin specification in NI or active control trials. The talk will describe the recent results of Li and Chi (June, 2011) on inferiority index and show how inferiority index can prove to be helpful in the current challenge encountered in (NI) margin specification by minimizing the subjectivity involved in the procedures and making them more transparent and consistent. Examples from the FDA Guidance on NI Trials will be given to illustrate its applications. Some new results and additional areas of research and applications will be mentioned. Targeted Maximum Likelihood Estimation: Estimatoin of causal effects in observational and experimental studies � Mark van der Laan, Ori Stittelman University of California, Berkeley [email protected] In this talk we present targeted maximum likelihood based estimators of a causal effect defined in realistic semiparametric models for the data generating experiment, that takes away the need for relying on parametric regression models. Fundamental concepts underlying this methodology are careful definition of the target parameter of the data generating distribution in a realistic semiparametric model, super Learning, i.e., the use of cross-validation to select optimal combinations of many candidate estimators, and subsequent targeted maximum likelihood estimation to target the fit of the data generating distribution towards the causal effect/target parameter of interest. The TMLE is a locally efficient double robust substitution estimator and has excellent finite sample robustness properties due to being a substitution estimator. We demonstrate the performance in simulation studies. We also illustrate this method for assessing causal effects of treatment on clinical outcomes in RCT and observational studies in HIV. In particular, we demonstrate the TMLE that allows the right-censoring/drop-out to be a function of timedependent covariates. (�Student Paper Award) Estimating Subject-Specific Treatment Differences for Risk-Benefit Assessment with Competing Risk Event-Time Data � Brian Claggett1 , Lihui Zhao1 , Lu Tian2 , Davide Castagno3 and Lee-Jen Wei1 1 Harvard School of Public Health 2 Stanford University School of Medicine 3 Brigham and Women’s Hospital and University of Turin [email protected] To evaluate treatment efficacy or toxicity using event-time data from a randomized comparative study, we usually make inference about a ICSA Applied Statistics Symposium 2011, NYC, June 26-29

summary measure which quantifies an overall treatment difference. However, a single measure for efficacy, even when coupled with that for toxicity, is difficult to be utilized for treating a future patient at his or her bedside. A positive (negative) study result based on such a measure does not mean that every future subject should (should not) be treated by the new therapy. For clinical practice, it is desirable to identify subjects who would benefit from the new treatment from a risk-benefit perspective. In this paper, we propose a systematic approach to achieve this goal using competing risk event-time data from two similar, but independent studies. We first utilize data from a study to build a parametric score with respect to a primary event for the purpose of stratifying the patients in the second study. We then use the data from the second study to obtain a nonparametric estimate of the treatment difference, with respect to each competing risk event, for any fixed score. Furthermore, confidence interval and band estimates are constructed to quantify the uncertainty of our inferences for the treatment differences over a range of scores. To illustrate the new proposal, we use the data sets from two cardiovascular studies for evaluating specific beta-blockers in patients with heart failure. The score is based on time to death, and the competing events are myocardial infarction, hospitalization and toxicity.

Session 55: Statistics in Environmental, Financial and Socical Science Cox Model with Point-Impact Predictor: Numerical Aspect � Yulei Zhang and Ian W. McKeague Columbia University [email protected] We briefly review theoretical results (i.e. asymptotics) of Cox model with point-impact predictor. Point-impact predictor as a covariate in Cox model is a single point out of some observed rough trajectory (depicted by fractional Brownian motion) for each subject’s characteristic variable. The limitation of the asymptotics is studied, and simulations for different scenarios are conducted to evaluate our theoretical results. Potential applications of the proposed method include genetics, network traffics and finance. Modular Latent Structure Analysis Tianwei Yu Emory University [email protected] Modular structures are ubiquitous across various types of biological networks. The study of network modularity can help reveal regulatory mechanisms in systems biology, evolutionary biology and developmental biology. Identifying putative modular latent structures from high-throughput data using exploratory analysis can help better interpret the data and generate new hypotheses. Unsupervised learning methods designed for global dimension reduction or clustering fall short of identifying modules with factors acting in linear combinations. We present an exploratory data analysis method named MLSA (Modular Latent Structure Analysis) to estimate modular latent structures, which can find co-regulative modules that involve non-coexpressive genes.Through simulations and real-data analyses, we show that the method can recover modular latent structures effectively. In addition, the method also performed very well on data generated from sparse global latent factor models. Efficient Semiparametric GARCH Modelling of Financial Volatility Li Wang1 , Cong Feng1 , � Qiongxia Song2 and Lijian Yang3 1 University of Georgia

77

Abstracts 2

The University of Texas at Dallas Michigan State University [email protected] We consider a class of semiparametric GARCH models with additive autoregressive components linked together by a dynamic coefficient. We propose estimators for the additive components and the dynamic coefficient based on spline smoothing. The estimation procedure involves only a small number of least squares operations, thus it is computationally efficient. Under regularity conditions, the proposed estimator of the parameter is root-n consistent and asymptotically normal. A simultaneous confidence band for the nonparametric component is proposed by an efficient one-step spline backfitting. The performance of our method is evaluated by various simulated processes and a real financial return series. For the empirical financial return series, we find further statistical evidence of the asymmetric news impact function. 3

Respiratory Protective Equipment, Health Outcomes, and Predictors of Mask Usage among World Trade Center Terrorist Attack Rescue and Recovery Workers Vinicius C. Antao1 , � L. L`azlo Pallos1 , Youn Shim1 , James H. Sapp II1 , Robert M. Brackbill1 , James E. Cone2 and Steven D. Stellman2 1 Agency For Toxic Substances and Disease Registry 2 New York City Department of Health [email protected] Following the September 11 terrorist attacks on the World Trade Center (WTC), thousands of rescue and recovery workers (RRW) reported serious respiratory illnesses. Much of this illness might have been prevented through proper selection, training, fit-testing, availability and consistent use of respiratory protective equipment (RPE). We analyzed the effects that using different types of masks had on respiratory outcomes among workers enrolled in the WTC Health Registry. We studied RRW who worked for at least one shift on the WTC debris pile and answered both Registry surveys in 2003-04 and 2006-07. Mask usage was categorized according to the least protective type of device used (greatest exposure). Outcomes of interest were recurrent symptoms and diseases (shortness of breath, wheezing, chronic cough, asthma, chronic obstructive pulmonary disease, and upper respiratory symptoms) with onset after, or which worsened after, 9/11 exposures. Our multivariate analyses were adjusted for demographics and exposure variables. The inclusion criteria were met by 9,296 of the RRW. Less than 20% of workers reported use of true respiratory protection on 9/11 and half of the workers wore no facial covering of any kind on that date. The strongest predictors of using adequate RPE were being affiliated with construction, utilities or environmental remediation organizations, and having received training in the use of RPE. Workers who reported no respiratory protection were more likely to report recurrent respiratory symptoms and diseases compared to those who used respirators. Training, selection, fit testing, and consistent use of RPE should be emphasized among all potential emergency responders.

Session 56: Meta Analysis and Evidence from Large Scale Studies Dynamics of Long-Term Imatinib Treatment Response � Min Tang1 , Mithat Gonen2 , Chani Field3 , Timothy P. Hughes3 , Susan Branford3 and Franziska Michor1 1 Harvard University and Dana-Farber Cancer Institute 2 Memorial Sloan-Kettering Cancer Center

78

3

The University of Adelaide [email protected] Treatment of chronic myeloid leukemia (CML) with the tyrosine kinase inhibitor imatinib represents a successful application of molecularly targeted anti-cancer therapy. However, the effect of imatinib on leukemic stem cells remains incompletely understood. Based on a statistical modeling approach using the 10-year imatinib treatment response of CML patients enrolled in the IRIS trial and a dataset with detailed follow-up for the first treatment year, we found that successful long-term imatinib therapy results in a tri-phasic exponential decline of BCR-ABL1 transcripts. The first slope of -0.056 ± 0.021 per day represents the turnover rate of leukemic differentiated cells, while the second slope of -0.0025 ± 0.0014 per day represents the turnover rate of leukemic progenitor cells. The third slope allows an inference of the behavior of immature leukemic cells, potentially stem cells. This third slope is negative in 18/22 patients (mean -0.0012, median -0.0008) and positive in 4/22 patients (mean 0.006, median 0.0013). This variability in response may be due to heterogeneity of patients, inconsistent adherence to drug, low levels of resistance, or variability in the detection of transcripts. Our approach suggests that long-term imatinib treatment may have the potential to reduce the abundance of leukemic stem cells in some patients. Using Probability Model to Evaluate Long Term Effect of FOBT in Colon Cancer Screening Dongfeng Wu University of Louisville [email protected] We apply the statistical method developed by Wu and Rosner (2010) using the Minnesota Colon Cancer Control Study (MCCCS) data. All initially asymptomatic participants will be classified into four mutually exclusive groups: True-early-detection, Noearly-detection, Over-diagnosis and Symptom-free; Human lifetime is treated as a random variable and is subject to competing risks. For both males and females in the study, we make predictive inferences on the percentage of these four categories in the FOBT screening. The probability of symptom-free is large for both genders and is about 95%. The probability of over-diagnosis among those detected by screening is lower than expected and is between 6% and 9%; the corresponding 95% C.I. is reported. The probability of true-earlydetection increases as screening interval decreases. It is about 76% for male patients and 85% for female patients if initial screening age is 50 and screening is done annually. The probability of noearly-detection increases as screening interval increases. We hope to provide valuable information on the effectiveness of the FOBT in colorectal cancer detection regarding the long-term effect. Effect of Selective Serotonin Reuptake Inhibitors on Risk of Fractures: a Meta-analysis of Observational Studies � Qing Wu1 , Angelle F. Bencaz2 , Joseph G. Hentz1 and Michael D. Crowell3 1 Biostatistics, Mayo Clinic in Arizona 2 School of Medicine, Louisiana State University Health Sciences Center 3 College of Medicine, Mayo Clinic in Arizona [email protected] Purpose: To assess whether people who take selective serotoninreuptake inhibitors (SSRIs) are at an increased risk of fracture. Methods: We conducted a meta-analysis of observational studies. Relevant studies published by February 2010 were identified through literature searches using MEDLINE (from 1966), EMBASE (from 1988), PsycINFO (from 1806), and manual searching ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts of reference lists. Only cohort or case-control studies that examined the association of SSRIs and risk of fracture and bone loss were included. Data were abstracted independently by 2 investigators using a standardized protocol; disagreements were resolved by consensus. Random effects models were used for pooled analysis. Results: Thirteen studies met inclusion criteria. Overall, SSRI use was associated with a significantly increased risk of fracture (relative risk [RR], 1.72 [95% confidence interval CI, 1.51-1.95]; P¡.001). An increased fracture risk associated with SSRIs also was observed in the 3 studies that adjusted for bone mineral density (RR, 1.70 [95% CI, 1.28-2.25]; P¡.001) and in the 4 studies that adjusted for depression (RR, 1.74 [95% CI, 1.28-2.36]; P¡.001). An increased annual bone loss of 0.19% was observed at the hip in 2 cohort studies of women (95% CI, 0.15-0.53%; P=.29). Conclusions: Use of SSRIs is associated with increased risk of fracture. The SSRIs may exert an increased risk of fracture independent of depression and bone mineral density. A Method to Estimate the Standard Error in Calculation of a Confidence Interval with Square-Root Transformed Data and Its Application to Meta Analysis in Clinical Trials � Jiangming Johnny Wu and Hanzhe Ray Zheng Merck Research Laboratories [email protected] Key Words: Square-root Transformation, Standard Error, ANOVA, Meta Analysis, Inverse Normal Method, Clinical Trials. Abstract: In clinical trials, Analysis of Variance (ANOVA) method is a very common practice. One of the key assumptions using ANOVA model is the normality of the data. However, in the real world of clinical trials this assumption is violated from time to time. A remedy under the circumstance of severe violation from normality is to use a non-linear transformation of the data. Square-root transformation, as well as log transformation and inverse transformation are some examples of the non-linear transformations that are commonly adopted. In a recent clinical trials, the square-root transformation was applied to the data before the analysis due to the fact that the non-transformed data did not meet normality assumptions. Subsequent meta analysis using the inverse normal method of this trial combined several similar trials led to concerns about the standard error estimate obtained because of the square-root transformed data. In this article we examine an alternative method to reestimate the standard error of data that has been transformed using the square-root transformation. Comparisons are made, using both clinical trial data and simulated data, between the original method and the alternative. Phenotype-Specific Gene Co-Expression Detection and Validation � Cuilan Gao and Cheng Cheng St. Jude Childrens Research Hospital [email protected] By utilizing the recent advance in microarray technology, investigators in biological and clinical studies are able to measure expression levels for thousands of genes simultaneously, across different conditions and over time. Analysis of data produced by such experiments offers potential insight into gene function and regulatory mechanisms. A key step to explore the system-level functionality of genes is to detect groups of co-expressed genes. One typical approach is to cluster the genes (or selected genes) by a certain metric of correlated genes first, then test if any of the clusters are associated with the phenotypes. In this paper, we propose a novel clustering algorithm to explore the gene co-expression pattern. Unlike the typical ICSA Applied Statistics Symposium 2011, NYC, June 26-29

gene clustering method, the proposed approach begins with testing the associations between each individual expressed gene and the phenotypes of interest. Then the top significantly phenotypeassociated expressed genes are selected by proper statistical significance criteria as leads for co-expression analysis. The co-expressed gene clusters are detected base on the correlation between all the other genes and the ‘core’ genes. In this way, the clusters represent co-expressed genes in the biological context defined by the phenotype. Meanwhile the computation burden is greatly reduced by starting with the lead genes. The association between each cluster and phonotype is further validated by an external validation procedure. The performance of the proposed method is demonstrated by simulated data and a gene expression profiling study of pediatric leukemia, both showed promising results.

Session 57: Semiparametric Models with Application in Biosciences Variable Selection in Partly Linear Censored Regression Model Shuangge Ma1 and � Pang Du2 1 Yale University 2 Virginia Polytechnic Institute and State University [email protected] Recent biomedical studies often measure two distinct sets of risk factors. The first is the low dimensional clinical and environmental measurements, and the second is the high dimensional gene expression measurements. For prognosis studies with right censored response variables, we propose a semiparametric regression model whose covariate effects have two parts. The first part is for the low dimensional covariates and takes a nonparametric form, whereas the second part is for the high dimensional covariates and takes a parametric form. An effective penalized variable selection approach is developed. The selection of parametric covariate effects is achieved using an iterated Lasso approach, for which we prove the selection consistency property. The nonparametric component is estimated using a sieve approach. An empirical model selection tool for the nonparametric component is derived based on the Kullback-Leibler geometry. Numerical studies show that the proposed approach has satisfactory finite-sample performance. Application to a lymphoma study illustrates the proposed method. Semiparametric Estimation and Variable Selection for Longitudinal Surveys � Lily Wang1 and Suojin Wang2 1 University of Georgia 2 Texas A&M University [email protected] A class of semiparametric marginal mean models is investigated for longitudinal surveys under multi-stage sampling design. A penalized estimating equation approach is proposed to select significant variables and estimate the coefficients simultaneously. The proposed estimators are design consistent and perform as well as the oracle ones when the correct model were known. EF-bootstrap is applied to obtain the standard errors of the estimated parameters with good accuracy. A fast and efficient algorithm is developed for survey practitioners to analyze complex longitudinal survey data within seconds. The results of Monte Carlo experiments confirm a good behavior of the proposed estimators with samples of moderate size. A health survey example is used to illustrate the application of the proposed method.

79

Abstracts Semi-parametric Hybrid Empirical Likelihood Inference for Two-sample Comparison With Censored Data � Haiyan Su1 , Mai Zhou2 and Hua Liang3 1 Montclair State University 2 University of Kentucky 3 University of Rochester [email protected] Two-sample comparison problems are often encountered in practical projects and have widely been studied in literature. Owing to practical demands, the research for this topic under special settings such as a semiparametric framework have also attracted great attentions. Zhou and Liang (2005) proposed an empirical likelihoodbased semi-parametric inference for the comparison of treatment effects in a two-sample problem with censored data. However, their approach is actually a pseudo-empirical likelihood and the method may not be fully efficient. In this study, we develop a new empirical likelihood-based inference under more general framework by using the hazard formulation of censored data for two sample semiparametric hybrid models. We demonstrate that our empirical likelihood statistic converges to a standard chi-squared distribution under the null hypothesis. We further illustrate the use of the proposed test by testing the ROC curve with censored data, among others. Numerical performance of the proposed method is also examined. Identification of Breast Cancer Prognosis Markers via Integrative Analysis � Shuangge Ma and Ying Dai Yale University [email protected] In breast cancer research, it is of great interest to identify genomic markers associated with prognosis. Multiple gene profiling studies have been conducted for such a purpose. Because of small sample sizes, genomic markers identified from the analysis of single datasets often do not have satisfactory reproducibility. A costeffective solution is to pool data from multiple comparable studies and conduct integrative analysis. In this study, we collect four breast cancer prognosis studies with gene expression measurements. We describe the relationship between prognosis and gene expressions using the accelerated failure time (AFT) models. We adopt a 2norm group bridge penalization approach for marker identification. This integrative analysis approach can effectively identify markers with consistent effects across multiple datasets and naturally accommodate the heterogeneity among studies. Statistical and simulation studies suggest satisfactory performance of this approach. Breast cancer prognosis markers identified using this approach have sound biological implications and satisfactory prediction performance.

Session 58: Statisticsl Methods for Anaqlysis of Next Generation Sequences Data From Epigenetic Profiling to Understanding Transcription Regulatory Mechanism Xiaole Shirley Liu Dana-Farber Cancer Institute and Harvard School of Public Health [email protected] The application of ChIP-chip/seq in recent years has greatly expedited the mechanistic understanding of transcription and epigenetic regulation. Although epigenetic profiles are often considered a reflection of the overall transcriptional activities, they can be effectively mined to reveal novel transcriptional mechanisms. I will

80

introduce an approach where we use the dynamics of H3K4me2marked nucleosomes to infer the in vivo binding of multiple transcription factors and its application to prostate cancer and mouse gut development. I will also discuss another study to use HDAC3 ChIP-seq to identify the circadian gene regulation mechanism in liver. Robust Identification of Sparse Segments in Ultra-High Dimensional Data Analysis Tony Cai, � X Jessie Jeng and Hongzhe Li University of Pennsylvania [email protected] Motivated by DNA copy number variation analysis based on next generation sequencing data, we consider the problem of identifying sparse short segments hidden in an ultra long linear sequence of data with unspecified noise distribution. Based on a local median transformation, we propose a computationally efficient method called robust segment identifier (RSI), which provides a robust and near-optimal solution for segment identification over a wide range of noise distributions. We theoretically quantify the conditions for detecting the signal segments and show that the RSI consistently estimates the signal segments whenever it is possible to detect their existence. Simulation studies are carried out to demonstrate the effect of the data transformation and the efficiency of the method under different noise distributions. We also present results from an application to copy number variant analysis using next generation sequencing data of the HapMap Yoruban sample NA19240 to further illustrate the theory and the methods. Statistical Modeling of Multi-reads in ChIP-Seq Analysis � Dongjun Chung1 , Pei Fen Kuan2 , Bo Li3 , Rajendran Sanalkumar4 , Kun Liang1 , Emery H. Bresnick4 , Colin Dewey3 and Sunduz Keles1 1 Department of Statistics, University of Wisconsin-Madison 2 Department of Biostatistics, University of North Carolina at Chapel Hill 3 Department of Computer Science, University of WisconsinMadison 4 Department of Stem Cell and Regenerative Biology, University of Wisconsin-Madison [email protected] Chromatin immunoprecipitation followed by sequencing (ChIPSeq) has revolutionized the genome-wide study of transcriptional regulation, such as profiling of DNA-binding proteins, histone modification, and nucleosome occupancy. Current statistical methods to analyze ChIP-Seq data only use reads that map to a single position on a reference genome (uni-reads), throwing out reads that map to multiple positions (multi-reads), which account for up to 20% of alignable reads. Moreover, such approaches make it challenging to identify binding locations that reside in regions of genome that have been duplicated over evolutionary time. We describe a general approach for utilizing multi-reads, adding power to resolve peaks. Essentially, multi-reads are allocated using our algorithm based on a generative probabilistic model. We demonstrate the utility of this multi-read data in studies of human STAT1 and mouse GATA1 transcription factors, showing greater sequencing depth and finding peaks which were missed in uni-read studies. Overall, the majority of peaks unique to multi-read analysis reside in highly repetitive regions on the genome. Importantly, we validated a number of our newly discovered GATA1 peaks experimentally, showing that they represent authentic binding targets of GATA1 transcription factor and reproducibly regulated their downstream genes. These compuICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts tational and experimental results establish that multi-reads can be of critical importance for studying transcription factor binding in highly repetitive regions of genomes with ChIP-Seq experiments. Modeling Intensity data from ABI/SOLiD second generation sequencing � Hao Wu1 , Rafeal Irizarry2 and H´ector Corrada Bravo3 1 Emory University 2 The Johns Hopkins University 3 University of Maryland [email protected] Capable of sequencing millions of short DNA fragments in parallel, second-generation sequencing technologies have rapidly revolutionized genomic research. The applications of this technology range from genotyping and structural variation analysis in whole genome studies to transcriptome quantification and reconstruction. Among several available platforms, SOLiD system from Applied Biosystems Inc. (ABI) provides an unique approach to translate a pair of adjacent nucleotides into one of the four colors during sequencing. The colors reported from the SOLiD system (color-calls) are results of a complicated statistical manipulation of noisy fluorescence intensity measurements, which introduces systematic biases that may mislead downstream analysis. In this talk I will first present a simple intensity pre-processing method for correcting these biases. A version of quantile normalization was developed which substantially improves yield and accuracy of calls at a small computational cost. In the second part of the talk, I will present a model based quality assessment of the color reads. A simple linear model was applied to the intensity measurements to capture the uncertainty arising in the base calling procedures. Compared to the factory provided quality scores, the results from our model provide more insightful read qualities.

Motivated by this observation, we present a method by which the distribution of a sum of dependent binary random variables is approximated and demonstrate the method by exploring the problem of estimating network density - a simple but fundamental characterization of a network - in the context of correlation networks with Gaussian noise. We illustrate in the context of functional connectivity networks in neuroscience. Integrative Prescreening in Analysis of Multiple Cancer Genomic Studies �

Rui Song1 , Jian Huang2 and Shuangge Ma3

1

Colorado State University

2

University of Iowa

3

Yale University [email protected] In high throughput cancer genomic studies, results from analysis of single datasets often suffer from a lack of reproducibility because of small sample sizes. Integrative analysis can effectively pool and analyze multiple datasets and provides a cost effective way to improve reproducibility. In integrative analysis, simultaneously analyzing all genes profiled may incur high computational cost. A computationally affordable remedy is prescreening, which fits marginal models, can be conducted in a parallel manner, and has low computational cost. In this article, we investigate integrative prescreening with multiple cancer genomic datasets. Theoretical properties are established. Simulation shows that the proposed integrative prescreening has better finite sample performance than alternatives including prescreening with individual datasets, an intensity approach, and meta-analysis. We apply the proposed approach to microarray gene profiling studies on liver and pancreatic cancers.

Session 59: Network and Related Topics

UPS Delivers Optimal Phase Diagram in High Dimensional Variable Selection

Some Recent Advances in Compressed Counting Ping Li1 and � Cun-Hui Zhang2 1 Cornell University 2 Rutgers University [email protected] Compressed counting uses network sketches to track summary statistics of dynamic data. Typically, it uses a much smaller number of sketches than compressed sensing, where convex penalization or minimization methods can be used to update the entire data stream. A well understood method of compressed counting is symmetric stable random projection. In this talk, we discuss efficient estimation of the Shannon entropy of dynamic data based on maximallyskewed stable random projections.

Pengsheng Ji1 and � Jiashun Jin2

Quantifying Uncertainty in Network Summary Statistics � Eric Kolaczyk and Wes Viles Boston University [email protected] Network-based data (e.g., from sensor, social, biological, and information networks) now play an important role across the sciences. Frequently the graphs used to represent networks are inferred from data. Surprisingly, however, in characterizing the higher-level properties of these networks (e.g., density, clustering, centrality), the uncertainty in their inferred topology typically is ignored. The distribution of estimators characterizing these networks defined implicitly through standard thresholding procedures can have distributions complicated by dependence inherent among the thresholded events. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

1

Cornell University

2

Carnegie Mellon University [email protected] We consider a linear regression model where both p and n are large but p > n. The vector of coefficients is unknown but is sparse in the sense that only a small proportion of its coordinates is nonzero, and we are interested in identifying these nonzero ones. We propose a two-stage variable selection procedure which we call the it UPS. This is a Screen and Clean procedure, in which we screen with the Univariate thresholding, and clean with the Penalized MLE. In many situations, the UPS possesses two important properties: Sure Screening and Separable After Screening (SAS). These properties enable us to reduce the original regression problem to many smallsize regression problems that can be fitted separately. As a result, the UPS is effective both in theory and in computation. The lasso and the subset selection are well-known approaches to variable selection. However, somewhat surprisingly, there are regions where neither the lasso nor the subset selection is rate optimal, even for very simple design matrix. The lasso is non-optimal because it is too loose in filtering out fake signals (i.e. noise that is highly correlated with a signal), and the subset selection is nonoptimal because it tends to kill one or more signals in correlated pairs, triplets, etc..

81

Abstracts Session 60: Current Approaches for Pharmaceutical Benefit-Risk Assessment Frameworks, Guidelines and Methodologies: What is New in Benefit-Risk Assessment? Bennett Levitan Johnson & Johnson Pharmaceutical RD, L.L.C. [email protected] The assessment and representation of benefit-risk “balance“ in pharmaceutical development has become increasingly important. The pressure for these assessments is made all the more difficult by the lack of any standardized or commonly-accepted approach to performing them. The variety of safety endpoints, with varying timescales and degree of impact on patients, makes quantitative approaches difficult. As a result, efficacy and safety results from clinical studies are often reported separately, and benefit-risk is often only characterized qualitatively. To address these issues, a number of industry and regulatory organizations have ongoing projects to develop standardized approaches towards benefit-risk assessment. The European Medicines Agency is conducting a three year Benefitrisk Methodology Project to identify decision-making models that can be used to make the assessment of medicines more consistent and transparent. The FDA has described a potential qualitative framework to help with structuring discussions on benefit-risk and serve as a standardized structure. The Pharmaceutical Research and Manufacturers of America (PhRMA) has developed the “BRAT Framework,“ a set of processes and tools for selecting, organizing, summarizing, and interpreting data relevant to benefit-risk decisions, the CMR International Institute for Regulatory Science has held a series of meetings on the topic that is leading towards a separate framework approach to benefit-risk, and other projects are underway. This presentation will present an overview of these projects as well as touch upon some of the technical approaches they use for quantitative benefit-risk assessment. Conjoint Analysis and Utility Models: Case Studies in Drug Benefit-Risk Assessment James Cross Genentech, Inc. [email protected] The assessment of a pharmaceutical product’s benefit-risk profile must consider a variety of factors which makes it complex. Various methods have been proposed as analytic tools to facilitate this evaluation. Two approaches, conjoint analysis and utility-based disease models, involve the use of weights to quantify the benefit-risk profile. Practical examples of these methods are needed to understand better their strengths and weaknesses. Two case studies, in metabolic and inflammatory diseases, are presented here to highlight the applicability of these approaches to facilitate benefit-risk assessment and corresponding decision-making in drug development and regulation. Benefit-Risk of Multiple Sclerosis Treatments: Lessons Learnt in Multi-Criteria Decision Analysis Richard Nixon1 , Pedro Oliveira2 and � Blair Ireland3 1 Novartis Pharmaceuticals Corporation 2 RTI Health Solutions 3 Novartis Pharmaceuticals Corporation [email protected] Benefit-risk analysis comprises an approach and a set of tools for qualitative structuring and quantitative analysis of the favorable and unfavorable outcomes of a decision. Multi-Criteria Decision Analysis (MCDA) is a method used to assess benefit-risk in health care.

82

We apply MCDA to assess the relative benefit-risk of four treatments for relapsing remitting multiple sclerosis (RRMS). Salient adverse events, clinical benefits and convenience measures are identified, and the magnitude of each of these criteria resulting from each treatment is found. The relative utilities of these magnitudes are assessed from experts and combined into an overall benefit-risk score. Sensitivity analysis is performed to identify the criteria with the greatest effects on the benefit-risk score. We reflect on lessons learnt during this exercise and the suitability of MCDA for performing benefit-risk analysis in health care. We conclude that it is a method well suited for this task, and we make suggestions for its use in practice.

Session 61: Causal Inference and its Applications in Drug Development Semiparametric Dimension Reduction for Mean Response Estimation with Application to HIV Study � Zonghui Hu, Dean Follmann and Jing Qin National Institutes of Health [email protected] This work focuses on the estimation of marginal mean response where the response observation may be incomplete but multiple covariates are available. A semiparametric estimator is proposed, which adopts nonparametric regression with a parametric working index. The desirable property of the estimator is two-fold. First, it eliminates the curse of dimensionality in nonparametric regression. Second, it has extended robustness to model misspecification: it is consistent for any working index if the missing probability of the response is correctly specified; even with the missing probability incorrectly specified, it is still consistent so long as the working index contains the essential information about the conditional mean response. We apply the proposed method to a randomized clinical trial in HIV. A Causal Effect Model with Stochastic Monotonicity Assumption � Chenguang Wang1 , Mike Daniels2 and Daniel Scharfstein3 1 U.S. Food and Drug Administration 2 University of Florida 3 The Johns Hopkins University [email protected] We propose an approach to obtain the causal effect of a treatment in a randomized controlled clinical trial where there is dropout due to loss to follow-up and protocol defined events (e.g. cancer progression, death, etc.). Specifically, we estimate the effect of treatment for those who would not have dropped out due to a protocol defined event under either treatment at a given time point, which is quantity that is similar to the survivor average causal effect (SACE). In this approach, we consider weaker assumptions than standardly applied assumptions such as deterministic monotonicity assumption that rules out defiers. We find bounds on the causal effects and characterize the uncertainty associated with the estimated bounds in a Bayesian framework. Assessing the Treatment Effects in Randomized Clinical Trials under Principal Stratification Framework Yahong Peng Pfizer Inc. [email protected] The Intent-To-Treat (ITT) based approach to assess treatment effect in randomized clinical trial has been the gold-standard and widely ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts accepted in drug development. The interpretation of the treatment effect can be complicated or compromised when not all subjects receive/comply with the intended treatment according to randomization due to variety of reasons. Principal stratification (Frangakis and Rubin 2002) provides a framework of alternative approaches to assess treatment effect, specifically the compliance average causal effect (CACE) which compares the treatment effect between active and control groups only among the subgroup of principal compliers. In this presentation, we’ll discuss potential different parameters of interest, and corresponding statistical analysis strategies with associated model assumptions to evaluate treatment effects in randomized clinical trials. A real data example from a Proof-of-Concept (POC) randomized clinical trial of a therapeutic vaccine in treating Alzheimer’s Disease patients will be used to illustrate different approaches. Causal Inference and Safety Analysis of Weight Loss Drugs Daniel Rubin U.S. Food and Drug Administration [email protected] The FDA has recently rejected several drugs that appeared effective for weight loss in short term studies, due to possible safety signals related to blood pressure increases or other surrogates for cardiovascular risk. Because sponsors have proposed risk management plans in which only weight loss responders would be kept on treatment for long term use, one goal of regulatory reviewers has been to assess whether there is a safety signal in this subset of subjects. A comparison between weight loss responders in treatment and placebo arms is not protected from confounding by randomization, so as an exploratory analysis we formulated the problem through principal stratification, and considered causal assumptions that allowed upper and lower bounds on average blood pressure outcomes within principal strata. This talk concerns ongoing collaborative research, and does not necessarily reflect the views of the Food and Drug Administration.

Session 62: Recent Advance in Multiplicity Approach Graphical Approaches to Multiple Test Procedures Frank Bretz Novartis Pharmaceuticals Corporation [email protected] Methods for addressing multiplicity are becoming increasingly more important in clinical studies. Several multiple test procedures have been developed that allow one to reflect the relative importance of different study objectives, such as fixed-sequence, fallback, and gatekeeping procedures. In this presentation we focus on graphical approaches that can be applied to common multiple test problems, such as comparing several treatments with a control, assessing the benefit of a new treatment for more than one endpoint, combined non-inferiority and superiority testing, or any combination thereof. Using graphical approaches, one can easily construct and explore different test strategies and thus tailor the test procedure to the given study objectives. The resulting procedures are represented by directed, weighted graphs, where each node corresponds to an elementary null hypothesis, together with a simple algorithm to generate such graphs while sequentially testing the individual hypotheses. The graphical approach will be presented using weighted Bonferroni tests and illustrated with several case studies using the graphical user interface from the gMCP package in R, which is freely available on CRAN. Extensions to weighted parametric and Simes tests will be discussed briefly. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Multistage Parallel Gatekeeping with Retesting � 1

George Kordzakhia1 and Alex Dmitrienko2

U.S. Food and Drug Administration

2

Eli Lilly and Company [email protected] This talk introduces a general method for constructing multistage parallel gatekeeping procedures with a retesting option. This approach serves as an extension of general multistage parallel gatekeeping procedures (Dmitrienko, Tamhane and Wiens, 2008) and parallel gatekeeping procedures with retest- ing for two-family problems (Dmitrienko,Kordzakhia and Tamhane, 2011). It was shown in the latter paper that power of parallel gatekeeping procedures can be im- proved by adding an additional retesting stage which enables retesting of the primary null hypotheses using a more powerful component procedure than the original procedure if all secondary null hypotheses are rejected. The new method enables clinical trial researchers to construct a class of more general multistage parallel gatekeeping procedures with retesting. The new procedures support multiple retesting of the primary and secondary families and do not require all secondary null hypotheses be rejected. Introduction to the Union Closure Method with Application to the Gatekeeping Strategy in Clinical Trials �

Han-Joo Kim1 , Richard Entsuah2 and Justine Shults3

1

Forest Laboratories, Inc.

2

Merck Research Laboratories

3

University of Pennsylvania School of Medicine [email protected] Clinical trials often involve testing multiple endpoints that are naturally grouped into families of hypotheses in a hierarchical order. The gatekeeping strategy is emerging as an important direction of improvement in the analysis of clinical trials. Gatekeeping is a multiple testing strategy which prioritizes groups of endpoints and then tests them sequentially in order to increase the power available for tests that involve more clinically important endpoints. In this talk, I will describe the union closure method that generalizes the closure method into a fixed order sequence. The proposed method can be used to design various gatekeeping procedures which sequentially examine each family with an appropriate alpha allocation. A graphical illustration will be given to show how the union closure method enforces coherence among various partitions of intersection hypotheses in the closure across all families in order to allow a short-cut in the procedure. An example from a clinical development program in fixed dosed combinations of two bronchodilators for patients with chronic obstructive pulmonary disease (COPD) will be used to illustrate step-by-step computation of the adjusted p-values. An Adaptive Alpha Spending Approach in Group Sequential Trials David Li Pfizer Inc. [email protected] Alpha spending functions have become a very popular approach in clinical trials where several interim analyses are planned. This presentation will introduce an adaptive alpha spending approach when the interim analysis timings are pre-determined.

83

Abstracts Session 63: Bioassay: Methodology for a Rapidly Developing Area Quantification of Real Time Polymerase Chain Reaction Data � Xuelin Huang1 , Wei Wei2 and Jing Ning3 1 The University of Texas MD Anderson Cancer Center 2 The University of Texas MD Anderson Cancer Center 3 The University of Texas Health Science Center at Houston [email protected] The quantitative real-time polymerase chain reaction (PCR) can now amplify even a tiny amount of DNA material to a detectable level and then back-calculate its initial number of molecules from the real-time PCR curve. It has been the golden standard used in modern molecular biology research to confirm discoveries by other techniques, such as microarrays and single nucleotide polymorphism (SNP) arrays. However, so far, most of the methods for the aforementioned back-calculation from PCR curves were developed by biologists. There is a great room for improvement by statisticians. Various parametric S-shape models have been proposed for such data. Although they all visually fit well to the PCR data curve, their performances on estimating the initial amount of DNA molecules are not satisfactory. We propose a new approach for the quantification of PCR data to improve the data quality. Data Analysis for Limiting Dilution Transplantation Experiments Hao Liu Baylor College of Medicine [email protected] As the definitive method to study mammary gland stem cells or tumor-initiating cells, limiting dilution transplantation experiments are a quantitative in vivo bioassay method. The experiment transplants mammary cells at limiting dilutions into cleared mammary fat pads. Frequency of stem cells is then analyzed by the traditional method for limited dilution assay under a single-hit Poisson model. Complicated circumstance however can arise in transplantation experiments. For example, certain experiments transplant epithelial stem cells into two cleared mammary glands within the same mouse. This results in correlated count data, and it is of biological interest to investigate the correlation. We develop a Bayesian method to analyze such bivariate correlated count data under the single hit Poisson model assumption. The method is illustrated by the analysis of real data. Design for a Small Serial Dilution Series � Daniel Zelterman1 , Alexander Tulupyev2 , Robert Heimer1 and Nadia Abdala1 1 Yale University 2 Russian Academy of Sciences [email protected] We describe statistical plans for a serial dilution series designed to detect and estimate the number of viral particles in a solution. The design addresses a problem when a very limited number of aliquots are available for proliferation. A gamma prior distribution on the number of viral particles allows us to describe the marginal probability distribution of all experimental outcomes. We examine a design that minimizes the expected reciprocal information and compare this with the maximum entropy design. We argue that the maximum entropy design is more useful from the point of view of the laboratory technician. The problem and design are motivated by our study of the viability of HIV in syringes and other equipment that might mediate blood-borne viral transmission.

84

Session 64: Statistical Challenges in Developing Biologics Multiple-stage Sampling Procedure for Lot Release with Consideration of Both Manufacturer’s and Consumer’s Risks � Bo-guang Zhen, Tiehua Ng and Henry Hsu U.S. Food and Drug Administration, CBER [email protected] Sampling inspection is implemented to ensure the quality of the biological products before its release to the market. However, the existing procedures in sampling inspection for lot release focus on the manufacturer’s risk. If both risks are accounted for, a much larger sample size would be required under certain circumstances. Sometimes the sample size might be too large to be implemented in practice. This presentation will first illustrate the sampling process using the American National Standard, and then propose a multiplestage sampling procedure for lot release with control of both manufacturer’s and consumer’s risks at pre-specified levels. Examples are presented to illustrate the use of the proposed procedure. Issues for lot release and the use of different sampling procedures are discussed. An Alternate Gamma-Fitting Statistical Method For Anti-Drug Antibody Assays To Establish Assay Cut-Points For Data with Non-Normal Distribution Brian Schlain Biogen Idec [email protected] Anti-drug or anti-therapeutic antibodies (ADAs, ATAs) can impact the safety and efficacy of the therapeutic by inducing a range of reactions from hypersensitivity to neutralization of the drug. Assessments of immunogenicity are dependent on the bioanalytical method used to test samples, in which a positive versus negative reactivity is determined by a statistically derived cut-point based ˇ samples. Non-normal immunoon the distribution of drug nadve genicity data distributions, which tend to be unimodal and positively skewed, can often be modeled by 3-parameter gamma fits. Issues in Fitting Bioassay Data � Jerry W. Lewis1 and Jason Liao2 1 Biogen Idec 2 Teva Pharmaceuticals [email protected] Under usual regularity conditions, least squares gives best linear unbiased estimation for linear models. It is well known that these optimality properties do not obtain with non-linear models. A novel method of visualizing the least squares surface will be presented that illuminates design based biases in estimation, and supports recommendations in USP draft guidelines. The exploration will be graphical and intuitive, and will lead to specific recommendations for commercial software. Analytical Comparability for Bioprocess Changes and Followon Biologics Jason Liao Teva Branded Pharmaceutical Products R&D, Inc. [email protected] Manufacturers of biotechnological products and vaccine must frequently introduce manufacturing changes throughout clinical development and after obtaining marketing authorization approval. These changes can be supported by comparability study. Comparability study is defined as quantitative assessment of extent of similarity of drug substances and drug products made before and after a process change. This includes the comparison of followon biologics with those of innovator’s. The purpose of comICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts parability studies is to ensure that the follow-on or post-change material/product/process does not have any significant changes that impact the material/product/process quality, safety and efficacy clinically comparing to the innovator or pre-change material/product/process. Assessing comparability is NOT just about meeting the predefined specification/criterion. The analytical capability in differentiating small meaningful difference is very important. In this talk, three statistical models are proposed for: (1) a paired head-to-head analytical comparability where pre- and post-change materials can be tested simultaneously, such as the measurements in a potency assay, (2) a non-paired head-to-head comparison where the two materials can be tested within a single assay run, such as sequential measurements by a HPLC-based method, and (3) one-sided testing of new post-change materials against a pre-existing database. Real data sets are used to demonstrate these approaches.

Session 65: Experimental Design and Clinical Trials Efficient and Ethical Adaptive Randomization Designs for Multi-armed Clinical Trials with Weibull Time-to-event Outcomes � Oleksandr Sverdlov1 , Yevgen Ryeznik2 and Weng Kee Wong3 1 Bristol-Myers Squibb 2 Kharkov National University of Economics 3 University of California, Los Angeles [email protected] In phase II of drug development, adaptive dose-response designs have been proposed as alternatives to traditional fixed-dose parallel group designs in order to facilitate efficient learning of doseresponse and minimize the exposure of study subjects to inferior dose levels, among other objectives. Despite profusion of adaptive dose finding methods in the recent literature, most of these methods assume binary or continuous outcomes, often modeled using monotone dose-response relationships. Many clinical trials use time-toevent-outcomes as primary measures of safety and/or efficacy. In this talk, we consider an experimental design problem of finding an optimal design strategy for clinical trials with multiple treatment arms and right-censored time-to-event primary outcomes which are modeled using the Weibull family of distributions. The D-optimal design is obtained, together with compound optimality designs providing tradeoffs between efficiency and ethical criteria. The proposed designs are studied theoretically, and are implemented using cohort response-adaptive randomization. We conclude that some of our designs can outperform traditional balanced randomization designs, if assessed in terms of multiple criteria, such as randomization, efficiency, and ethics. The Projective Generalized Aberration Criterion and Its Applications in Factorial Designs Chang-Xing Ma State University of New York at Buffalo [email protected] A new criterion - minimum projective generalized aberration is developed for comparing factorial designs. Its properties are discussed. It extends minimum generalized aberration (MGA) and classical minimum aberration (MA) criteria. The new criterion enables us to order two isomorphic designs with the same MGA. Phase II Cancer Clinical Trials with Heterogeneous Patient Populations Sin-Ho Jung1 , Myron N. Chang2 and � Sun J. Kang3 ICSA Applied Statistics Symposium 2011, NYC, June 26-29

1

Duke University University of Florida 3 SUNY Downstate Medical Center [email protected] The patient population for a phase II trial often consists of multiple subgroups in terms of risk level. In this case, a popular design approach is to specify the response rate and the prevalence of each subgroup, to calculate the response rate of the whole population by the weighted average of the response rates across subgroups, and to choose a standard phase II design such as Simon’s optimal or minimax design to test on the response rate for the whole population. In this case, although the prevalence of each subgroup is accurately specified, the observed prevalence among the accrued patients to the study may be quite different from the expected one because of the small sample size, which is typical in most phase II trials. The fixed rejection value for a chosen standard phase II design may be either too conservative (i.e., increasing the false rejection probability of the experimental therapy) if the trial accrues more high-risk patients than expected or too anti-conservative (i.e., increasing the false acceptance probability of the experimental therapy) if the trial accrues more low-risk patients than expected. We can avoid such problem by adjusting the rejection values depending on the observed prevalence from the trial. In this paper, we investigate the performance of the flexible designs compared with the standard design with fixed rejection values under various settings. 2

A Proof-of-Concept Clinical Trial Design Combined with Dose Ranging Exploration Xin Wang1 and � Naitee Ting2 1 Pfizer Inc. 2 Boehringer Ingelheim Pharmaceuticals [email protected] In recent years, pharmaceutical industry has experienced many challenges in discovering and developing new drugs. Given these challenges, many sponsors attempt to speed up the clinical development process. One of these processes is to combine the Proof of Concept (PoC) and the dose-ranging clinical studies into a single trial. This manuscript proposes approaches to help address both PoC and dose-ranging objectives in such a combined design. One proposal is to use a linear-trend test for PoC, together with a serial gatekeeping method (TGK) to identify individual doses; the other is to use the dose response curve estimated from a 3-parameter Emax model to establish PoC and explore activities of various doses. Simulations were performed to evaluate the performance of both proposals. Recommendations were made based on the simulation results.

Session 66: Advancing Clinical Trial Methods A “Simonized” Bayesian Design for Phase II Cancer Clinical Trials � Yimei Li1 , Rosemarie Mick2 and Daniel F Heitjan2 1 The Children’s Hospital of Philadelphia 2 Philadelphia University [email protected] Phase II cancer clinical trials commonly employ two-stage designs, such as Simon’s optimal design, that incorporate a single interim analysis for lack of efficacy and are designed to achieve specified frequentist properties. Practical concerns complicate the use of such designs in actual clinical trials: the rigid requirement to examine the outcome at a pre-specified sample size; the potential need to suspend accrual when approaching the end of a stage; and the unpredictability of an interim analysis time that is driven by patient

85

Abstracts accrual. We aim to develop a flexible design that is easier to implement than current designs while maintaining desirable statistical properties. Specifically we propose a “Simonized” Bayesian design, which emulates the properties of the Simon design in a Bayesian analysis with a pre-scheduled interim analysis time, and provide the option of suspending accrual or not. Simulation studies demonstrate that the design maintains good operating characteristics, such as controlled type I and type II error rates, high probability of early termination and small expected sample size when the experimental regimen is inactive. Prediction in Randomized Clinical Trials � Gui-shuang Ying and Daniel F. Heitjan University of Pennsylvania [email protected] In clinical trials with time to event as an outcome, the statistical power/sample size calculation is often determined by the number of events, and many trials are designed to end upon the occurrence of a pre-specified number of events. For the logistic and operation reasons, it is of interest to predict the time to reach certain number of events, and the optimal combination of total number of enrollment and the length of follow-up to reach targeted number of events. In this talk, we will present prediction approaches that are based on the distributions of the time to enrollment, time to event and time to loss of follow-up. These predictions are real-time and realistic in the sense that they are based on the accumulated data from trial itself and can be updated with new data. We demonstrate the methods by data from a randomized clinical trial. Analysis of Pediatric Obesity Studies: Difference in Conclusions Based on Choice of BMI-Derivate Outcome Renee H. Moore University of Pennsylvania [email protected] The rising rates of childhood obesity from the 1980s through 2000s, coupled with its various health-related and psychosocial consequences, has led to childhood obesity becoming a major public health concern of today. Obesity studies are concerned with weight change or weight maintenance and accordingly, weight is the consensus for primary outcome in adult studies. However, the interpretation of data from obesity clinical trials in children is complicated by the fact that children are normally experiencing weight and height changes over time. This has resulted in the use of many different outcomes to evaluate the success of obesity prevention and treatment trials and to identify risk factors associated with obesity. The utilization of different outcomes in the childhood growth and obesity literature has prevented comparison of interventions and the pooling of many research studies for meta-analyses. Furthermore, some investigations comparing properties of the different BMI-deviate outcomes have suggested that certain outcomes (e.g. BMI) are more appropriate than other outcomes (e.g. BMI z-score), depending on the study design and the obesity status of the participants. In order to bring the field closer to a consensus on the use of different BMI outcomes in pediatric weight and obesity studies, additional research comparing the various outcome choices is needed. The aim of this project is to compare the results and conclusions based on different outcomes currently utilized in the pediatric obesity literature and to provide insight into when a specific outcome may be more appropriate than another. In this project, we apply each outcome to existing data from three published pediatric studies: one observational study that includes healthy weight, overweight, and obese children, one obesity prevention clinical trial that also in-

86

cludes children of varying weight status, and one obesity treatment clinical trial that includes severely obese children.

Session 67: Panel Session I: Adaptive Designs—When Can and How Do We Get There From Here? Panel Discussion Sue-Jane Wang1 , Jerry Schindler2 , Vlad Dragalin3 , Lu Cui4 , H.M. James Hung1 , Jack J. Lee5 and Brenda Gaydos6 1 U.S. Food and Drug Administration 2 Merck & Co., Inc. 3 ADDPLAN, Aptiv Solutions 4 Eisai Co., Ltd. 5 The University of Texas MD Anderson Cancer Center 6 Eli Lilly and Company [email protected] In early to mid 2000, the adaptive design was enthusiastically advanced to regulatory proposals submitted by the pharmaceutical drug sponsors. The interests were driven by the potential tangible benefit of saving the drug development costs and the development time. However, the submission trend appeared to have slowed down after PhRMA Adaptive Design Workshop held in Bethesda November 2006. Since the release of the US FDA adaptive design draft guidance in 2010, the interest arises again and appears to be thoughtful in planning. Some of the proposals are exploratory in nature and others aim for confirmatory evidence. Some of the proposals are being implemented by the pharmaceutical drug sponsors though statistical validity may not be well understood. This panel session will feature three adaptive design paradigms. These paradigms have seen in regulatory applications. Some ongoing experiences from these paradigms will serve as the sounding board and prepare the panelists for discussion. The panelists consist of experienced statisticians from regulator, academia and pharmaceutical industry. Panelists: • Jerry Schindler, Merck; jerald [email protected] • Vladimir Dragalin, ADDPLAN; [email protected] • Lu Cui, Eisai; Lu [email protected] • H.M. James Hung, FDA; [email protected] • Jack J. Lee, MD Anderson Cancer Center; [email protected] • Brenda Gaydos, Eli Lilly and Company; GAYDOS [email protected]

Session 68: Theoretical Developments The Effect of Preliminary Unit Root Tests on the Prediction Intervals for the Gaussian Autoregressive Processes with Additive Outliers � Wararit Panichkitkosolkul and Sa-aat Niwitpong King Mongkut’s University of Technology North Bangkok [email protected] The preliminary unit root test has been found to be a useful tool for improving the accuracy of a one-step-ahead predictor for the firstorder autoregressive process when an autoregressive coefficient is ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts close to one. This paper applies the concepts of the preliminary unit root test in order to improve the efficiency of prediction intervals for the Gaussian autoregressive processes with additive outliers. The preliminary unit root tests considered are the augmented DickeyFuller (ADF) unit root test and the Shin et al.’s (SSL) unit root test. In addition, the analytic expressions of the coverage probability of prediction intervals are derived, and the structure of the coverage probability has proved that it is independent from the mean of the process and the parameter of the innovation, but it is a function of the autoregressive coefficients only. For the parameter estimation of processes, we use the generalized M-estimates. The coverage probabilities and the lengths of the standard prediction interval, the prediction interval following the ADF unit root test, and the prediction interval following the SSL unit root test are also compared via simulation studies. Simulation results have shown that the preliminary unit root test can help to improve the accuracy of the prediction intervals with additive outliers when the autoregressive processes are close to the non-stationary region. Robust Confidence Interval for the Means of Zero-Lognormal Distribution � Mathew Anthony C. Rosales1 and Magdalena NiewiadomskaBugaj2 1 Comsys 2 Western Michigan University [email protected] Zero-heavy data are very common in many disciplines such as insurance, medical research, life sciences, marine sciences and engineering. In this paper, robustness of the interval estimators for the mean of lognormal distribution with a positive mass at zero was investigated and a new method was proposed. In real life (e.g., in marine sciences), data that are said to be from lognormal distribution were often better fit by other skewed distributions. This is in addition to the fact that goodness-of-fit test could not reliably detect departure from lognormality of the positive observation when sample size is small. A comprehensive Monte-Carlo simulation study was performed and revealed that the proposed method outperforms other methods in terms of coverage probability and interval width when the data depart from the assumed model or are contaminated by extremely large values. Proof for the Underestimation of the Greenwood-Type Estimators Jiantian Wang Kean University [email protected] It has been already noticed that the Greenwood-type estimators for the variances of the Nelson-Aalen estimator (NA) and the KaplanMeier estimator (KM) will significantly underestimate the true variances of these estimators. However, theoretical confirmation for such conclusion has not been available yet. In this paper, we investigate some small sample properties of the NA and the KM under the Koziol-Green Model. By revealing the asymptotic structures of the variances of the NA and the KM up to O(n−2 ), we give a proof for the aforementioned result. Furthermore, some improved variance estimators and some elegant exact bias expressions of NA and KM have been derived. Characterization through Distributional Properties of Generalized Order Statistics A. H. Khan1 and � Imtiyaz A. Shah2 1 Department of Statistics and Operations Research, Aligarh Muslim University ICSA Applied Statistics Symposium 2011, NYC, June 26-29

2

Aligarh Muslim University [email protected] Abstract Distributional properties of two non-adjacent generalized order statistics have been used to characterize distributions. Further, one sided contraction and dilation for the generalized order statistics are discussed and then the results are deduced for adjacent generalized order statistics, order statistics and records. Pitman Closest Equivariant Estimators and Predictorsunder Location-Scale Models � Haojin Zhou and Tapan K Nayak George Washington University [email protected] First, we derive Pitman closest equivariant estimators of the location, scale and percentiles of a distribution in a location-scale family. Then, we adopt the Pitman closeness criterion for applying to prediction problems, and derive best equivariant predictors under location, scale and location-scale models. One attractive feature of our approach is that the best estimators and best predictors are very robust with respect to the choice of the loss function. We also apply our theoretical results to some specific problems that are of much practical interest.

Session 69: Bayesian Inferences and Applications Application of Normal Dynamic Linear Model(NDLM) in a Clinical Trial James Ngai i3 Statprobe [email protected] Normal Dynamic Linear Models (NDLM) have been used in clinical adaptive designs. There are often pragmatic reasons for simpler trial designs than fully adaptive ones. From the point of view of study conduct, we still want flexibility in modelling dose-response: Potentially non-monotonic response, Dropping ineffective doses and Terminating the study early due to futility. The NDLM has certain benefits as the dose-response model in this case, it can handle a wide variety of possible dose-response curves, including nonmonotonic relationships. The dose-response may not be monotonic due to dropouts influencing change from baseline. It can be easily implemented in a Bayesian updating framework. Within this framework it provides direct probabilistic statements about many features of the dose-response. A statistical success decision rule is defined as: if there is at least one dose having statistically significant efficacy. The “statistically significant efficacy”ˇt is considered to be met if “the posterior probability that Serum change at week X is > 0.66 mg/dL”ˇt is > 60%. Based on previous historical result, we will consider 3 different scenarios of the Serum change at week X of each dose (mg/dL) : null case, minimum efficacy case and good efficacy case. Simulation results based on 10,000 simulated trials per scenario will be discussed and assuming that the standard deviation of Serum change at week X is W mg/dL. Using the current NDLM method in the study will have about 72% power to detect a minimum efficacy of W mg/dL mean Serum change at week X with a Type I error rate of about 20%. A critique of the NDLM method will be given as compared to the eMax model. A Bayesian Statistical Analysis Approach for Critical Success Factors Development � Grace Li, Fanni Natanegara, Ming-Dauh Wang and Paul Berg

87

Abstracts Eli Lilly and Company Li Ying [email protected] A critical success factor (CSF) needs to be evaluated in a quantifiable manner to guide better decision making in the drug development process. Ideally CSF(s) should be integrated with a study’s objectives. The parameter(s) to evaluate a CSF can be any measurement related to safety, efficacy, or a combination of the two. In early phase drug development, many of the studies including dose finding trials are not as well powered as confirmatory studies. Can a more quantifiable CSF be proposed with the aid of an appropriate statistical tool? In this presentation, we will share such an assessment of a CSF using a probability statement, rather than the traditional “statistically significant difference“ claim between study drug and placebo. A Bayesian approach will be applied to help the medical team to reach the goal of better decision making in the drug development process. Some examples will be given to illustrate how the method works. A Bayesian Approach for Early Pharmacokinetics Readout in Phase II Clinical Trials � Bo Jin1 and Yuyan Duan2 1 Biotherapeutics Statistics, Pfizer Inc. 2 Global Biometrics Sciences, Bristol-Myers Squibb [email protected] Early pharmacokinetics readout in Phase II clinical trials, which may occur after a small portion of patients data are available, is to evaluate whether the doses planned will achieve the targeted pharmacokinetics profiles or not. It may be considered as a useful interim check in the phase II clinical trials, especially when the previous knowledge on the pharmacokinetic characteristics of the drug was based on a different population (e.g., healthy subjects vs. patients) or a different drug formulation. In this presentation we suggest a Bayesian approach to make decisions on the early PK data. Specifically, an informative power prior is used to incorporate historical data into the current study, and the comparisons with traditional tests in terms of Type I and Type II errors will be presented. Theory and Method for Bayesian Inference of Cox Regression model with Gamma Process Priors in Presence of Ties Zhiyi Chi, � Arijit Sinha and Ming-Hui Chen University of Connecticut [email protected] Cox’s (1972) Proportional Hazards (PH) model is one of the most important models for fitting survival data in presence of covariates. Properties of the PH model under the Bayesian framework was discussed by Kalbfleisch (1978) when the prior distribution of the baseline cumulative hazard function is assumed to be a Gamma process. Although for continuous time-to-event data the theoretical probability that two or more events happen at the same time is zero, in a variety of real life data, event times are tied. In this paper, we carry out an in-depth theoretical investigation of the PH model with gamma process priors for tied event times. Several theorems are established to derive the likelihood function and its properties for right censored survival data with covariates and tied event times. In addition, an innovative simulation algorithm via direct forward sampling and Gibbs sampling is developed to generate tied failure times under this setup. The Conditional Predictive Ordinate (CPO) based Bayesian measure is derived for these complex models. A new Gibbs sampling algorithm via several sets of latent variables is developed to carry out all posterior computations. A simulation study is conducted and a real dataset is used to further illustrate the proposed methodology.

88

Bayesian Nonparametric Centered Random Effects Models with Variable Selection Mingan Yang Saint Louis University [email protected] In linear mixed effects model, it is common to assume the random effects to follow a parametric distribution such as normal distribution with mean zero. For variable selection in a linear mixed effects model, substantial violation of normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. For nonparametric random effects model, a challenge is to control the bias on the fixed effects by the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject specific random effects nonparametrically with Dirichlet process and resolve the bias simultaneously. The approach is implemented using a stochastic search Gibbs sampler to allow both fixed and random effects to be dropped effectively out of the model. Simulation and real data analysis are provided for illustration.

Session 70: Statistical Analysis on Spatial and Temporal Data Additive Hazards Regression and Partial Likelihood Estimation for Ecological Monitoring Data Across Space � Feng-Chang Lin1 and Jun Zhu2 1 Department of Biostatistics, University of North Carolina at Chapel Hill 2 Department of Statistics, University of Wisconsin-Madison [email protected] We develop continuous-time models for the analysis of environmental or ecological monitoring data such that subjects are observed at multiple monitoring time points across space. Of particular interest are additive hazards regression models where the baseline hazard function can take on flexible forms. We consider time-varying covariates and take into account spatial dependence via autoregression in space and time. We develop statistical inference for the regression coefficients via partial likelihood. Asymptotic properties, including consistency and asymptotic normality, are established for parameter estimates under suitable regularity conditions. Feasible algorithms utilizing existing statistical software packages are developed for computation. We also consider a simpler additive hazards model with homogeneous baseline hazard and develop hypothesis testing for homogeneity. A simulation study demonstrates that the statistical inference using partial likelihood has sound finite-sample properties and offers a viable alternative to maximum likelihood estimation. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin. Spatial-Temporal Analysis of Non-Hodgkin Lymphoma in a Case-Control Study David Wheeler National Cancer Institute [email protected] Investigating spatial-temporal patterns of disease incidence can identify areas with significantly different disease risks than the overall study population and can lead to the development of hypotheses to explain the pattern of risk. Little is known generally about the etiology of non-Hodgkin lymphoma (NHL), or about the time from possible environmental exposure to diagnosis that might be relevant for NHL. In this research, we sought to determine the time of ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts residential location that was most associated with NHL risk and determine if there were geographic areas of significantly elevated risk for NHL using this lag time of potential environmental exposure. We conducted a population-based case-control study of NHL in four National Cancer Institute (NCI)-Surveillance, Epidemiology, and End Results (SEER) centers: Detroit, Iowa, Los Angeles, and Seattle in 1998-2000. Complete 20-year residential histories were available for 842 cases and 680 controls. We used generalized additive models to model spatially the probability that an individual had NHL and to identify spatial clusters of elevated NHL risk. Models were adjusted for NHL risk factors including age, gender, race, education, and exposure to the pesticide chlordane. We fitted models at five different times of residence in a time frame of etiologic relevance and selected the model with the time most associated with risk of NHL. The time periods were time at diagnosis, and 5, 10, 15, and 20 years before diagnosis. Results of this study showed that a lag time of 20 years before diagnosis was most associated with risk of NHL, as the best model fit was for residences 20 years prior to diagnosis in Detroit (p-value = 0.07), Iowa (p-value = 0.14), and Los Angeles (p-value = 0.03). After adjusting for associated risk factors, there were areas of statistically significant risk in Detroit, Iowa, and Los Angeles at a 20-year time lag. On Modeling Ecological Monitoring Data in Space Jun Zhu University of Wisconsin-Madison [email protected] Different approaches are possible for modeling and analysing environmental or ecological monitoring binary data such that subjects are observed at multiple monitoring time points across space. Here I consider both discrete-time and continuous-time models. Of particular interest are autoregressive models and hazards regression models where the baseline hazard function can take on flexible forms. We compare statistical inference and computational algorithms in these two approaches. For illustration, we analyze data from an ecological study that monitors bark beetle colonization of red pines in a plantation of Wisconsin. Hierarchical Dynamic Modeling of Outbreaks of Mountain Pine Beetle Yanbing Zheng University of Kentucky [email protected] Outbreaks of mountain pine beetle have caused landscape-level mortality to mature trees in pine forests in western North America. Here we examine the historical outbreak of mountain pine beetle on the Chilcotin Plateau in British Columbia, Canada, from 1972 to 1986. We develop a generalized linear mixed model with spatialtemporal random effect to trace the origin of the outbreak and test whether the outbreak tends to build along Klinaklini, Chilcotin, and Fraser Rivers or along all six drainages on the plateau. We propose a Bayesian hierarchical model for statistical inference and devise Markov chain Monte Carlo algorithms for computation.

Session 71: Recent Developments in Methods for Handling Missing Data Bayesian Analysis for Mixtures of Incomplete Continuous, Ordinal and Nominal Repeated Measures � Xiao Zhang1 , John Boscardin2 and Tom Belin3 1 University of Alabama, Birmingham 2 University of California, San Francisco ICSA Applied Statistics Symposium 2011, NYC, June 26-29

3

University of California, Los Angeles [email protected] From a Bayesian perspective, we propose a unified model for analyzing incomplete mixtures of continuous, ordinal and nominal repeated measures that combines multivariate normal linear regression models for repeated continuous variables, multivariate probit models for repeated ordinal variables, and multivariate multinomial probit models for repeated nominal variables. Since both the multivariate probit model and the multivariate multinomial probit model assume underlying normal variables for each ordinal and nominal variable, respectively, we combine the multivariate normal variables for continuous data and the underlying normal variables from the multivariate probit model and the multivariate multinomial probit model. Then we describe a generalized linear model for the mixture of continuous, ordinal and nominal repeated outcomes based on the combined normal variables and develop an MCMC algorithm to estimate covariate effects, cut-points for the ordinal data, and the covariance matrix for the continuous measures, both observed and latent. Combining the available variables allows one to model the mixture of continuous, ordinal and nominal repeated data simultaneously, leading in a natural way to a Monte Carlo Markov Chain (MCMC) algorithm incorporating flexible priors for the covariance matrix. The resulting model offers a straightforward framework for inference about the association structure among diverse measures. We consider both missing-at-random and missing-not-at-random missing-data mechanisms, and we use simulated examples and an application to REGARDS (The Reasons for Geographic and Racial Differences) in a stroke study to illustrate the method. Diagnosing Imputation Models by Applying Target Analyses to Posterior Replicates of Completed Data � Yulei He and Alan M. Zaslavsky Harvard Medical School [email protected] Multiple imputation fills in missing data with posterior predictive draws from imputation models. To assess the adequacy of imputation models, we can compare completed data to their replicates simulated under the imputation model. We apply analyses of substantive interest to both datasets, and use posterior predictive checks of the differences of these estimates to quantify the evidence of model inadequacy. We can further integrate out the imputed missing data and their replicates over the completed-data analyses to reduce variance in the comparison. We also consider strategies addressing the conservatism of posterior predictive p-values. In many cases, the checking procedure can be easily implemented using standard imputation software by treating re-imputations under the model as posterior predictive replicates. Thus it can be applied for non-Bayesian imputation methods. We also sketch several strategies for applying the method in the context of practical imputation analyses. We illustrate the method using a simulation study and two real data applications. Bayesian Modeling and Inference for the Data with Informative Switches or Dropouts of Planned Treatments � Ming-Hui Chen1 , Qingxia Chen2 , Joseph G. Ibrahim3 and David Ohlssen4 1 University of Connecticut 2 Vanderbilt University 3 University of North Carolina at Chapel Hill 4 Novartis Pharmaceuticals Corporation [email protected] In randomized clinical trials, it is common that patients may stop taking their assigned treatments and then start the standard treat-

89

Abstracts ment or completely dropout from the study. In addition, patients may miss scheduled visits even staying in the study. This type of missingness is called an intermitent missing. In this paper, we develop a novel Bayesian method for jointly modeling longitudinal treatment measurements and various dropouts. Specifically, we propose a multivariate normal mixed effects model for repeated measurements from the assigned treatments and the standard treatment, a multivariate logistic regression model for stopping the assigned treatments, a conditional logistic regression for starting the standard treatment, and a conditional multivariate logistic regression model for completely withdrawing from the study. We assume that withdrawing from the study are non-ignorable but intermitent missings are at random.Various properties of the proposed model are examined. An efficient Markov chain Monte Carlo sampling algorithm is developed. A real data set from a clinical trial is analyzed in detail via the proposed method.

Methods for Evaluating ED Interventions Wendy Lou University of Toronto [email protected] Time-efficient diagnosis and treatment of patients presenting at the emergency department (ED) are essential for positive health outcomes. Strategies for reductions in wait times at various stages from triage to discharge are often implemented within a healthcare system, but methods for evaluating the effectiveness of these interventions sometimes involve unrealistic assumptions. Methodology that allows for systematic assessment of changes attributed to interventions, for example in ED wait times, under more realistic settings will be presented. The approach utilizes the information derived from retrospective and prospective system monitoring through statistical process control methods. A motivating example from a study involving multiple hospitals of a health system will be given to illustrate the methodology. The challenges as well as opportunities for such statistical methodologies will be discussed.

Session 72: Methodology for and Applications of Administrative Data

Public Health Administrative Data, Bayesian Disease Mapping, and Population Health Information and Policy Ying MacNab The University of British Columbia [email protected] Public health administrative data, such as provincial-or nationallevel vital statistics data on mortality or health service administrative data on hospital care utilization and outcomes, have been indispensable data sources for geographical mapping of mortality and health service utilization and outcome. In fact, the availability of disease and health-related data at local administrative areal levels has motivated considerable research interests on Bayesian statistical methods developments for small area disease and health mapping. This presentation gives a review on the key developments of the Bayesian disease mapping methodology and on applications of disease mapping in the context of population health information, regional public health, and evidence-based health policy.

Capture-Recapture Techniques to Estimate Chronic Disease Prevalence in the Presence of Misclassification Error in Administrative Health Databases � Lisa Lix and Chel Hee Lee University of Saskatchewan [email protected] Background: Administrative databases, which were developed for physician remuneration and health system management, are increasingly used for chronic disease surveillance and research. Capturerecapture (CR) techniques have been proposed to estimate the size of chronic disease populations from administrative databases. Many CR techniques rest on assumptions that will not be satisfied, including the assumption of accurate classification of disease cases. Purpose and Objectives: The purpose is to develop and compare CR techniques to adjust for misclassification error when estimating population size. The objectives are to investigate two-source CR techniques: (a) Chao lower-bound estimator, (b) Chao lower bound estimator with adjustment for misclassification error, (c) multinomial logistic regression, (d) multinomial logistic regression with adjustment for misclassification error. Methods: Computer simulation was used to compare the methods; simulation parameters were based on analyses of existing databases. Correlated binary data were generated for positive predictive values (PPVs) of 55%, 80%, or 100% in source 1 and a PPV of 100% in source 2. Covariates associated with heterogeneity of capture probability were generated assuming a normal or multinomial distribution. Population estimates were adjusted for misclassification error based on prior information about PPV and uncertainty in the estimates was modeled using its posterior distribution. Bias and mean square error (MSE) of population size estimates were based on 1000 replications. Results: None of the CR techniques was robust to both dependence and heterogeneity. When the PPV for source 1 was 100%, the techniques resulted in negatively biased estimates with values from −.03 to −.24 for Chao’s estimator and −.06 to −.26 for multinomial logistic regression. Chao’s estimator was less sensitive to high correlation between data sources. Simulations for other values of PPV are in progress. Conclusions: CR models to minimize the effects of assumption violations on population size estimates should be applied to administrative health databases.

90

A Case Study on Analysis of Administrative Health Data in the Era of Knowledge Translation � Rhonda J Rosychuk and Amanda S Newton University of Alberta [email protected] As statistical researchers, we develop new methods and examine the properties of existing methods. Typically these activities occur in isolation or are motivated by an applied problem presented by subject matter experts. While a biostatistician may provide an analysis or solution to a real health-related research problem, the results may not directly translate to policy and patient care. In the era of Knowledge Translation (KT), researchers and users of research are encouraged to exchange, synthesize, and apply knowledge in order to improve the health of communities and the health care system. Research funders often require a detailed KT plan. Biostatisticians can be important contributors to the KT efforts by ensuring proper analysis and interpretation of results; however, biostatisticians may lack the expertise to fully engage in the KT process, especially if they initiate a research project. We will discuss an ongoing research collaboration in pediatric emergency mental health care that has provided the opportunity for KT activities. We have extracted emergency department visits, physician claims, and hospitalizations from large, population-based provincial databases in Alberta, Canada. We will outline our comprehensive data analyses, advocacy for more sophisticated statistical analyses, plans for ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts innovative methods, interactions with stakeholders, and KT activities.

Session 73: Fiducial Inference, Generalized Inference, and Applications Inferential Procedures Based on the Generalized Variable Approach with Applications � Krishnamoorthy Kalimuthu University of Louisiana at Lafayette [email protected] The generalized p-value has been introduced by Tsui and Weerahandi (1989, JASA) and the generalized confidence interval by Weerahandi (1993, JASA). The concepts of generalized p-values and generalized confidence intervals have turned out to be extremely fruitful for obtaining tests and confidence intervals involving “nonstandard” parameters, such as the log normal mean and quantiles in one-way random model. In this talk, I will first explain a method of constructing a generalized pivotal quantity for a parameter in a general setup. Then, construction of generalized pivotal quantities and inferential procedures based on them will be outlined for normal parameters, lognormal mean and to compare two lognormal means. I will briefly explain the applications of the generalized variable (GV) approach for setting tolerance limits in one-way random model, for correlation analysis in a multivariate normal distribution and finding one-sided limits for stress-strength reliability involving two-parameter exponential distributions. I will also compare the results based on the GV approach with those of the other methods, and illustrate the results using practical examples. Some Applications of Generalized Variable Approach in Diagnostic Studies � Lili Tian1 , Tuochuan Dong1 and Chengjie Xiong2 1 State University of New York at Buffalo 2 Washington University at St. Louis [email protected] Receiver-operating characteristic (ROC) curve is a standard tool for evaluating the performance of diagnostic tests. The area under the ROC curve (AU C) and the partial area under ROC curve (P AU C) have been widely used as quantitative indexes of discriminating ability of a continuous biomarker between two states of a disease. In practice, there exist many disease processes with three ordinal disease stages; i.e. “healthy”, “early diseased stage” and “diseased“. For biomarkers or diagnostic tests for discriminating among three disease stages, the true classification rate p 2 for “early diseased stage” is a function p 1 and p 3 where p 1 and p 3 are the true classification rates for “healthy” and “diseased” groups respectively, and therefore defines a surface in the three-dimensional space (p 1, p 2, p 3), called ROC surface. The volume under surface (V U S) and the partial volume under surface have been used as summary measures for the diagnostic accuracy. Despite the fact that there exist many statistical literatures for related topics in diagnostic studies, there still exist many unsolved problems, and for some solved problems, the existing methods are very clumsy. Motivated by a study on Alzheimer’s disease from the Washington University Knight Alzheimer’s Disease Research Center, this talk presents a few topics in this field for which the generalized method, proposed by Tsui and Weerahandi (JASA, 1989) and Weerahandi (JASA, 1993), provides elegant and excellent solutions. On Generalized Fiducial Inference Jan Hannig ICSA Applied Statistics Symposium 2011, NYC, June 26-29

University of North Carolina at Chapel Hill [email protected] R. A. Fisher’s fiducial inference has been the subject of many discussions and controversies ever since he introduced the idea during the 1930’s. The idea experienced a bumpy ride, to say the least, during its early years and one can safely say that it eventually fell into disfavor among mainstream statisticians. However, it appears to have made a resurgence recently under the label of generalized inference. In this new guise fiducial inference has proved to be a useful tool for deriving statistical procedures for problems where frequentist methods with good properties were previously unavailable. Therefore we believe that the fiducial argument of R.A. Fisher deserves a fresh look from a new angle. In this talk we first generalize Fisher’s fiducial argument and obtain a fiducial recipe applicable in virtually any situation. We demonstrate this fiducial recipe on examples of varying complexity. We also investigate, by simulation and by theoretical considerations, some properties of the statistical procedures derived by the fiducial recipe showing they often posses good repeated sampling, frequentist properties. Portions of this talk are based on a joined work with Hari Iyer, Thomas C.M. Lee and Jessi Cisewski

Session 74: Functional Data Analysis Simultaneous Curve Registration and Clustering for Functional Data � Xueli Liu1 and Mark C.K. Yang2 1 City of Hope 2 University of Florida [email protected] Study of dynamic processes in many areas of science has led to the appearance of functional data sets. It is often the case that individual trajectories vary both in the amplitude space and in the time space. In the analysis of such data, the phase variation will often pose severe problems. For example, clustering is an important tool to search for homogeneous subgroup pattern curves. The ignorance of the phase variability may lead to poorly represented pattern curves and less accurate clustering results. We develop a coherent clustering procedure that allows for temporal aligning. Under this framework, closed form solutions of an EM type learning algorithm are derived. The method can be applied to all types of curve data but is particularly useful when phase variation is present. We demonstrate the method by both simulation studies and an application to human growth curves. A New Semiparametric Estimation Method for Accelerated Hazard Model � Jiajia Zhang1 , Yingwei Peng2 and Ou Zhao1 1 University of South Carolina 2 Queen’s University [email protected] The accelerated hazard model has been proposed for more than a decade. However, its application is still very limited, partly due to the complexity of the existing semiparametric estimation method. We propose a new semiparametric estimation method based on a kernel-smoothed approximation to the limit of a profile likelihood function of the model. The method leads to smooth estimating equations and is easy to use. The estimates from the method are proved to be consistent and asymptotically normal. Our numerical study shows that the new method is more efficient than the existing

91

Abstracts method. The proposed method is employed to reanalyze the data from a brain tumor treatment study. Analysis for Temporal Gene Expressions under Multiple Biological Conditions Hong-Bin Fang1 and � Dengliang Deng2 1 University of Maryland 2 University of Regina [email protected] Temporal gene expression data are of particular interest to researchers as it contains rich information in characterization of gene function and have been widely used in biomedical studies and cancer early detection. However, the current temporal gene expressions usually have few measuring time series levels, extracting information and identifying efficient treatment effects without loss temporal information are still in problem. A dense temporal gene expression data in bacteria shows that the gene expression has various patterns under different biological conditions. Instead of analysis of gene expression levels, we consider the relative change-rates of gene in the observation period in this paper. We propose a non-linear regression model to characterize the relative change-rates of genes, in which individual expression trajectory is modeled as longitudinal data with changeable variance and covariance structure. Then, based on the parameter estimates, a chi-square test is proposed to test the equality of gene expressions. Furthermore, the Mahalanobis distance is used for the classification of genes. The proposed methods are applied to the dataset of 32 genes in P. aeruginosa expressed in 39 biological conditions. The simulation studies show that our methods are well performance for analysis of temporal gene expressions. Discussion Hong-Bin Fang University of Maryland School of Medicine [email protected] In this invited Session, there are three invited talks on functional data analysis. I will give a review and the further development on functional data analysis.

Session 75: Statistical Method and Theory for HighDimensional Data Control of Generalized False Discovery Proportion � Sanat Sarkar1 and Wenge Guo2 1 Temple University 2 New Jersey Institute of Technology [email protected] The idea of improving the traditional, and often too conservative, notion of familywise error rate (FWER) has been one of the main motivations behind much of the methodological developments taken place in modern multiple testing. One particular direction in which this idea has flourished is generalizing the notion of FWER from its original definition of the probability of at least one false rejection or a non-zero fraction of false rejections to one that allows more but tolerable number or fraction of false rejections. In many situations, particularly in microarray and brain imaging studies, where a large number of hypotheses are tested, one is willing to tolerate a few more than one false rejection but wants to control too many of them. Also, often it is extremely unlikely that exactly one hypothesis will be falsely rejected when there is high positive dependence among a group or groups of test statistics corresponding to true null hypotheses. This happens, for instance, in microarray

92

experiments where the genes involved in the same biological process or pathway are highly dependent on each other and exhibit similar expression patterns. In all such cases, a procedure controlling the probability of at least k false rejections, the k-FWER, for some fixed k > 1, or the probability of the false discovery proportion (FDP) exceeding gamma, the gamma-FDP, for some fixed 0 < gamma < 1, will have a better ability to detect more false null hypotheses than the corresponding FWER procedure (i.e., when k = 1 or gamma = 0). However, if one is willing to tolerate at most k − 1 false rejections, the notion of gamma-FDP does not completely take that into account unless it is generalized accordingly. In this paper, we will introduce such a generalized notion of gamma-FDP and present procedures that control it. Large-Scale Multiple Testing under Dependence � Wenguang Sun1 and Tony Cai2 1 North Carolina State University 2 University of Pennsylvania [email protected] The impact of dependence is an important topic in large-scale multiple testing and has been extensively studied in the literature. However, the discussions have focused on the validity issue, and the important optimality issue is largely ignored. This talk considers multiple testing under dependence in a compound decision theoretic framework. For data generated from an underlying two-state hidden Markov model, we construct oracle and asymptotically optimal data-driven procedures that aim to minimize the false non-discovery rate (FNR) subject to a constraint the false discovery rate (FDR). Both theoretical properties and numerical performances of the proposed procedures are investigated. It is shown that the proposed procedures control the FDR at the desired level, enjoy certain optimality properties and are especially powerful in identifying clustered non-null cases. The results show that the power of tests can be substantially improved by adaptively exploiting the dependency structure among hypotheses, and hence conventional FDR procedures that ignore this structural information are inefficient. Extensions beyond the HMM for set-wise inference and pattern identification, as well as applications in spatial data and time-course data analyses will be discussed if time permits. Sparse Estimation of Conditional Graphical Models with Application � Bing Li The Pennsylvania State University [email protected] In many applications the graph structure in a network arises from two sources: intrinsic connections and connections due to external effects. We introduce a sparse estimation procedure for graphical models that is capable of isolating the intrinsic connections by removing the external effects. Technically, this is formulated as a em conditional graphical model, in which the external effects are modeled as predictors, and the graph is determined by the nonzero entries of the conditional precision matrix. We introduce two sparse estimators of the conditional precision matrix using reproducing kernel Hilbert space combined with lasso and adaptive lasso. We establish the sparse property, variable selection consistency, oracle property, and derive the explicit asymptotic distributions of the proposed estimators for a specific type of reproducing kernel. The methods are compared with sparse estimators for unconditional graphical models, and with the constrained maximum likelihood estimate that assumes a known graph structure. Finally, the methods are applied to a genetic data set to construct a gene network after ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts removing the effects from single-nucleotide polymorphisms. On the Generalized of BH Procedure Jiashun Jin1 and � Zhigen Zhao2 1 Carnegie Mellon University 2 Temple University [email protected] When considering the right sided test, the procedure of Benjamini and Hochberg (1995) chooses the ideal rejection region as [x 0, +inf ty) asymptotically. The procedure is optimal when assuming that the likelihood ratio function h(x) is monotone increasing in x. When h(x) is non-monotonic increasing, the ideal rejection region is the one cut by the likelihood function h(x), a union of several disconnected intervals. In a real data example, the function h(x) is unimodal, and the ideal rejection region is an interval [R L(x), R R(x)]. Under the unimodal setting, the BH procedures fails to be optimal even in the ideal case. In the data-driven approach, we estimate the two ending points of the ideal rejection region based on the empirical CDF which is known to uniformly converge to the true CDF with rate O p(f rac1sqrtn). We can thus show that the convergence rate of these two ending points is O p(f rac1sqrt[4]n). On the other hand, the local fdr based approaches are optimal ideally. However, they require good estimations of the density function uniformly which is usually unattainable. Consequently, these approaches won’t have a sharp convergence rate as ours. At the end of the talk, we will show the comparison of various procedures under two real data analysis, demonstrating that the proposed approach works the best in terms of the false discovery rate controlling and the power in detecting the true signals. Please come and see how, and why it works. This is a joint work with Dr. Jiashun Jin.

Session 76: Statistical Genomics Concordant Gene Set Enrichment Analysis of Two Large-Scale Expression Data Sets Yinglei Lai George Washington University [email protected] The recent large-scale technologies like microarrays and RNA-seq allow us to collect genome-wide expression profiles for biomedical studies. Genes showing significant differential expression are potentially important biomarkers. A gene set enrichment analysis enables us to identify groups of genes (e.g. pathways) showing coordinate differential expression. Genes and gene sets showing consistent behavior in two related studies can be of great biological interest. However, since the sample sizes are usually small but the numbers of genes are large, it is difficult to identify truly differentially expressed genes and determine whether a gene or a gene set behaves concordantly in two related studies. We have recently shown that the mixture model based approach can be an efficient solution for the concordant analysis of differential expression in two two-sample large-scale expression data sets. The advantage of the mixture model based approach is that the probability of a particular behavior (up-regulated or down-regulated) can be estimated for a given gene. Thus, it is feasible to address how likely this gene shows a concordant behavior in both data sets. In this study, we extend this approach for the concordant gene set enrichment analysis. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Generalized Poisson Model for RNA-seq Data Analysis Sudeep Srivastava and � Liang Chen Department of Biological Sciences, University of Southern California [email protected] RNA-Seq has been a revolutionary tool to characterize and quantify the transcriptome of a given cell. In order to quantify transcriptome accurately, an appropriate model is in need to separate the bias in the sequencing process from the true expression rates of regions of interest. This is a challenging problem because the reason for the sequencing bias is mainly unknown, and the bias affects many aspects of downstream transcriptome analyses. We proposed a Generalized Poisson (GP) model to address the difficulties. Specifically, we developed a two-parameter GP model to the position-level read counts. We showed that the GP model fits the RNA-seq data much better than the traditional Poisson model. Based on the GP model, we can better estimate gene or exon expression, perform a more reasonable normalization across different samples, and improve the identification of differentially expressed genes and the identification of differentially spliced exons. The usefulness of the GP model is demonstrated by applications to multiple RNA-seq data sets. Joint Estimation of Multiple Gaussian Graphical Models by Nonconvex Penalty Functions with an Application to Genomic Data � Hyonho Chun1 , Xianghua Zhang2 and Hongyu Zhao2 1 Purdue University 2 Yale University [email protected] Inferring unknown gene regulation networks is one of key questions in systems biology with important applications such as understanding disease physiology and drug discovery. These applications require inferring multiple networks in order to reveal the differences among different conditions. The multiple networks can be inferred by Gaussian graphical models by introducing sparsity on the inverse covariance matrices via penalization either individually or jointly. We propose a class of nonconvex penalty functions for the joint estimation of multiple Gaussian graphical models. Our approach is capable of regularizing both common and condition specific associations without explicit parametrization as well as has oracle property for both common and specific associations. We show the performance of our nonconvex penalty functions by simulation study and then apply it to real genomic dataset. Genetic Risk Predictions from Genome Wide Association Studies Ning Sun Yale University [email protected] Recent genome wide association studies (GWAS) have identified many genetic variants affecting complex human diseases, and it is estimated that hundreds or thousands of genetic variants are related to common disease etiologies. It is of great interest to translate the very extensive and valuable data collected from GWAS into reliable risk prediction models in order to identify individuals at high risk for different diseases. Because typical GWAS consider up to millions of genetic markers, such data present unprecedented statistical challenges in detecting association signals and selecting a subset of markers to develop risk prediction models. In this talk, we explore several statistical approaches for prediction in this very high dimensional setting and compare the performance of these methods using GWAS data.

93

Abstracts Session 77: Assessment of Blinding and Placebo Effect Blinding Assessment in Clinical Trials: A Review of Statistical Methods and a Proposal of Blinding Assessment Protocol � Heejung Bang1 , Stephen Flaherty2 , Jafar Kolahi3 and Jongbae Park2 1 Weill Cornell Medical College 2 University of North Carolina at Chapel Hill 3 Isfahan University of Medical Sciences [email protected] There is strong consensus in the clinical trial community that blinding is an important issue in randomized controlled trials. At present grossly incomplete reporting of procedures and the use of any assessment for blinding still prevails. The term “double-blind” has almost become a convention without any checks or balances. In addition, there is a lack of consensus on quantitative procedures for evaluating the success of blinding in the literature. In this talk, we will review statistical methods of blinding assessment along with software options, and discuss some of the most pressing issues surrounding the acquisition, interpretation, and reporting of blinding data. We also propose a sample blinding assessment protocol to address some of these issues. Finally, we will discuss the CONSORT 2010 changes on blinding issues. Design and Assessment of Blinding in Medical Device Trials � Alvin Van Orden and Martin Ho U.S. Food and Drug Administration, CDRH [email protected] Blinding is an essential part in establishing the quality of evidence presented in clinical trials of medical devices. It prevents biases caused by the knowledge of treatment assignment among study subjects, investigators, as well as outcome evaluators. It also maintains the compliance and retention of the study subjects. Doubts are often raised if the subject can feel the device working or if the adverse events hint the subject as to the treatment being received. Therefore, an appropriate assessments can provide valuable support evidence of successful blinding to ease concerns when the study results are interpreted. In this presentation, we will discuss some general regulatory considerations about designing, implementing and assessing blinding in medical device studies. Then, we will present a case study where the blinding assessment established the effectiveness of the masking. In a trial where the effectiveness of the device was marginal, it was important to establish that the small effect was not a product of the lack of blinding. Placebo Effect-Adjusted Assessment of Quality of Life in Placebo-Controlled Clinical Trials Jens Eickhoff University of Wisconsin-Madison [email protected] Quality of life (QoL) has become an accepted and widely used endpoint in clinical trials. The analytical tools used for QoL evaluations in clinical trials differ from those used for the more traditional endpoints, such as response to disease, overall survival or progressionfree survival. Since QoL assessments are generally performed on self-administered questionnaires, QoL endpoints are more prone to a placebo effect than traditional clinical endpoints. The placebo effect is a well-documented phenomenon in clinical trials, which has led to dramatic consequences on the clinical development of new therapeutic agents. In order to account for the placebo effect, a multivariate latent variable model is proposed, which allows for misclassification in the QoL item responses. The approach is flexible in the sense that it can be used for the analysis of a wide variety

94

of multi-dimensional QoL instruments. The approach is illustrated with analysis of data from a cardiovascular phase III clinical trial.

Session 78: Recent Advances in Survival Analysis and Clinical Trials Semiparametric Additive Transformation Model under Current Status Date � Guang Cheng and Xiao Wang Purdue University [email protected] We consider the efficient estimation of the semiparametric additive transformation model with current status data. A wide range of survival models and econometric models can be incorporated into this general transformation framework. We apply the B-spline approach to simultaneously estimate the linear regression vector, the nondecreasing transformation function, and a set of nonparametric regression functions. We show that the parametric estimate is semiparametric efficient in the presence of multiple nonparametric nuisance functions. An explicit consistent B-spline estimate of the asymptotic variance is also provided. All nonparametric estimates are smooth, and shown to be uniformly consistent and have faster than cubic rate of convergence. Interestingly, we observe the convergence rate interfere phenomenon, i.e., the convergence rates of B-spline estimators are all slowed down to equal the slowest one. The constrained optimization is not required in our implementation. Numerical results are used to illustrate the finite sample performance of the proposed estimators. Evaluating Optimal Treatment Policies Based on Gene Expression Profiles � Ian McKeague1 and Min Qian2 1 Columbia University 2 University of Michigan [email protected] This talk discusses optimal treatment policies based on interactions between treatment and gene expression, and determined by thresholds in gene expression at a small number of loci along a chromosome. Such treatment policies are easier to interpret and implement (in bioassays, say) than policies based on complete gene expression profiles. By formulating the statistical problem in terms of a sparse functional linear regression model, we show how data from randomized clinical trails can be used to simultaneously evaluate the effectiveness of the treatment policies (measured in terms of mean outcome when all patients follow the policy), and to locate genes that optimize the interaction effect over competing treatments. A Comparison of Multiple Imputation via Chained Equations and General Location Model for Accelerated Failure Time Models with Missing Covariates � Lihong Qi1 , Yulei He2 , Rongqi Chen1 and Xiaowei Yang1 1 University of California, Davis 2 Harvard University [email protected] Missing covariates are common in biomedical studies with survival outcomes. Multiple imputation is a practical strategy for handling this problem with various approaches and software packages available for implementation. In this talk, I will compare two important approaches: multiple imputation by chained equation (MICE) and multiple imputation via a general location model (GLM) for accelerated failure time (AFT) models with missing covariates. Through a comprehensive simulation study, we investigated the performance ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts of the two approaches and their robustness toward violation of the GLM assumptions and model misspecifications including misspecifications of the covariance structure and of the joint distribution of continuous covariates. Simulation results show that MICE can be sensitive to model misspecifications and may generate biased results with inflated standard errors while GLM can still yield estimates with reasonable biases and coverage in these situations. MICE is flexible to use but lack of a clear theoretical rationale and suffers from potential incompatibility of the conditional regression models used in imputation. In contrast, GLM is theoretically sound and can be rather robust toward model misspecifications and violations of GLM assumptions. Therefore, we believe that GLM shows the potential for being a competitive and attractive tool for tackling the analysis of AFT models with missing covariates. Semiparametric Modeling and Inference under Possible Treatment-Time Interaction in Clinical Trials for Time to Event Data Song Yang National Heart Lung and Blood Institute [email protected] The hazard ratio provides a natural target for assessing a treatment effect with survival data, with the Cox proportional hazards model providing a widely used special case. In general, the hazard ratio is a function of time, and provides a visual display of the temporal pattern of the treatment effect. A variety of procedures have been proposed to deal with possibly non-proportional hazards, such as the nonparametric method, piece-wise Cox model, or Cox model with a defined covariate to accommodate non-proportionality. We present some of the recent advances on a semiparametric model that allows a wide range of time-varying hazard ratio shapes. Simultaneous confidence intervals for the hazard ratio function as well as the average hazard ratio function, and some omnibus tests for checking the model will be discussed. These procedures and comparison with other procedures will be illustrated in applications to data from Women’s Health Initiative estrogen plus progestin clinical trial.

Session 79: Biomarker Based Adaptive Design and Analysis for Targeted Agent Development An Oncology Case Study in Using a De Novo Approach to Identifying Drug-sensitive Subpopulations in the Add-on Therapy Setting Jared Lunceford and � Yue Wang Merck Research Laboratories jared [email protected] In many Phase II oncology trials, an investigational drug (ID) is used in combination with standard-of-care (SOC) treatment, and efficacy is compared to SOC alone. If the trial fails to meet its primary efficacy hypothesis, an retrospective exploratory search for a subset of the patient population for which efficacy is adequate may ensue. In the context of microarray gene expression profiling, this involves constructing de-novo models for survival outcomes that identify ID predictive signatures in the presence of potential associations between expression and survival outcome under SOC, i.e. associations common to both ID+SOC and SOC alone. We outline a strategy, based on the supervised principal components method (Bair et al. 2006) in a Cox regression setting, to discover ID-predictive signatures in the presence of SOC-predictive signatures and we suggest a formal test to demonstrate the ID-predictive component of the model is statistically significant. Using a case-study trial as the ICSA Applied Statistics Symposium 2011, NYC, June 26-29

basis for simulation, the sample sizes necessary to demonstrate statistically significant ID-predictive effects in the de-novo microarray setting are explored. Biomarker-Based Bayesian Adaptive Designs for Targeted Agent Development � J. Jack Lee, Suyu Liu and Nan Chen The University of Texas MD Anderson Cancer Center [email protected] Advances in biomedicine have fueled the development of targeted agents in cancer therapy. Targeted therapies have shown to be more efficacious and less toxic than the conventional chemotherapies. Targeted therapies, however, do not work for all patients. One major challenge is to identify markers for predicting treatment efficacy. We have developed biomarker-based Bayesian adaptive designs to (1) identify prognostic and predictive markers for targeted agents, (2) test treatment efficacy, and (3) provide better treatments for patients enrolled in the trial. In contrast to the frequentist equal randomization designs, Bayesian adaptive randomization designs allow treating more patients with effective treatments, monitoring the trial more frequently to stop ineffective treatments early, and increasing efficiency while controlling type I and type II errors. Bayesian adaptive design can be more efficient, more ethical, and more flexible in the study conduct than standard designs. Examples and lessons learned from recent trials will be given. The Cross-Validated Adaptive Signature Design Boris Freidlin1 , � Wenyu Jiang2 and Richard Simon1 1 National Cancer Institute 2 Queen’s University [email protected] Many anticancer therapies benefit only a subset of treated patients and may be overlooked by the traditional broad eligibility approach to design phase III clinical trials. New biotechnologies such as microarrays can be used to identify the patients that are most likely to benefit from anticancer therapies. However, due to the highdimensional nature of the genomic data, developing a reliable classifier by the time the definitive phase III trail is designed may not be feasible. Previously, Freidlin and Simon (Clinical Cancer Research, 2005) introduced the adaptive signature design that combines a prospective development of a sensitive patient classifier and a properly powered test for overall effect in a single pivotal trial. In this article, we propose a cross-validation extension of the adaptive signature design that optimizes the efficiency of both the classifier development and the validation components of the design. The new design is evaluated through simulations and is applied to data from a randomized breast cancer trial. The cross-validation approach is shown to considerably improve the performance of the adaptive signature design. We also describe approaches to the estimation of the treatment effect for the identified sensitive subpopulation. Biomarker Decision Rules in Early Phase Oncology Proof of Mechanism Trials David Raunig Pfizer Inc. [email protected] Early phase Clinical Trials with FDG- or FLT-PET imaging endpoints are not concerned with survival but with Proof of Mechanism (POM), but guidelines for FDG/FLT-PET are based on survival and the high failure rate of Phase II/III trials for drugs that passed POM may be due to improperly applying group-level inference to patient response. Survival-related categorization could be far-removed from the Mechanism of Action, and mechanistic re-

95

Abstracts sponse in these patients may not be associated with survival. Heterogeneous patient populations are often small and difficult for mixture methods to provide reliable and robust results. We propose that a minimum acceptable value (MAV) for response, based a single patient response conditional on the nonresponder population is a practical method of limiting the false response rate. Methods: Simulations were used to establish the operating characteristics of the LME and t-test. Phase 1 clinical trial FDG-PET. Simulations were based on data from a FAK inhibitor study with a population that was hypothesized to have a 30% response rate. Results: The LME model was robust in all cohorts, including those with half of the patients with only 1 lesion. The LME has similar operating characteristics as t-tests for all tested lesion configurations; both have a high false responder rate that approaches 40% when evaluating for significant difference from 0% response. A MAV of 12% based on within-subject correlation of 0.7 and the use of False Discovery Rate for multiple patient comparisons controlled the experimentwise false responder rate to the designed value of 0.2. Conclusions: The LME is a robust method for the analysis of POM data with as few as half of the subjects having multiple lesions. With these procedures and 10 patients, POM can be declared with 95% confidence with 4 declared responders.

Session 80: Analysis of Complex data Efficient High-Order Gene-by-Gene Interaction Analysis for Genome-wide Association Analysis � Taesung Park1 , Sohee Oh1 , Jaehoon Lee1 , Min-Seok1 and Kyunga Kim2 1 Seoul National University 2 Sookmyung Women’s University [email protected] Most complex biological phenotypes are often affected by multiple genes and environmental factors. Thus, the investigation of genegene and gene-environment interactions can be essential in understanding the genetic architecture of complex traits. Many different methods have been proposed to analyze gene-gene interactions in genetic association studies. Among them, the multifactor dimensionality reduction (MDR) method is known to have the advantages in examining high-order interactions and detecting interactions and has been widely applied to detect gene-gene interactions in many common diseases. However, the current MDR focuses on SNP level interactions. In this paper, we propose the gene-based MDR method (GENE-MDR) which focuses on gene level interactions. GENEMDR first summarizes the gene effect through SNP interactions and then performs MDR analysis for the summarized gene effects. GENE-MDR is very effective in dealing with higher order interactions in GWAS. Comparative Evaluation of Gene-Set Analysis Methods for the Survival Phenotype � S. Y. Lee1 , J. H. Kim2 and S. H. Lee2 1 Sejong University and University of Washington 2 Sejong University [email protected] Many gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes. We focused on the survival phenotype and compared the performance of Gene Set Enrichment Analysis (GSEA), Global test (GT), and Wald-type test (WT) methods based on simulation datasets and a real example of ovarian cancer data. We considered two versions of the GSEA test by allowing different

96

weights: GSEA1 has an equal weight, which yields results similar to the Kolmogorov-Smirnov test; while GSEA2 has a weight which considers the correlation between genes and the phenotype. We compared GSEA1, GSEA2, GT and WT through a simulation experiment in which various scenarios are considered depending on the correlation structure of genes and the association parameter between survival and the genes. Simulation results showed that both GT and WT consistently have higher power than GSEA1 and GSEA2 across all scenarios. However, the four tests show a controversial trend in power depending on the combination of correlation structure and association parameter. For the real-world dataset of ovarian cancer, GT and WT detected 15 and 13 significant pathways among 204 pathways, respectively, while only GSEA2 detected only one pathway, with p¡0.01. In addition, under the FDR with q¡0.05, the GT detected three pathways and WT detected only one pathway. From the simulation datasets and a real example, both GT and WT tests have high power in simulated datasets and seem to be too liberal in the real dataset whereas GSEA1 and GSEA2 tests have low power in simulation datasets and tend to be too conservative in the real dataset. This may be ascribed to the fact that GSEA is a nonparametric rank-based test whereas both GT and WT are regression-based tests. We also found that the power of the four tests is much higher under correlated genes than under independent genes when survival is positively associated with genes. It seems that there is a synergistic effect in detecting significant gene sets when significant genes have within-correlation and the association between survival and genes is positive or negative (i.e., onedirection correlation). Bayes Multiple Decision Functions � Wensong Wu1 and Edsel Pena2 1 Department of Statistics, University of South Carolina 2 Department of Statistics, University of South Carolina [email protected] In this presentation, a general form of Bayes multiple decision functions (BMDF) with respect to a class of cost-weighted loss functions is introduced. An algorithm of finding the BMDF is provided based upon posterior expectations. In particular, loss functions such as false discovery proportion (FDP), false nondiscovery proportion (FNP), and missed discovery proportion (MDP) are considered, and the cost weights are customer-determined. Results are applicable in many settings including multiple hypothesis testing, multiple classification and prediction, and high-dimensional variable selection. A dependent data structure is allowed and is modeled through a class of frailty-induced Archimedean copulas. In particular, nonGaussian dependent data structure is of interest, especially in settings with failure-time data. Computation of the posterior expectations is facilitated by using Sequential Monte Carlo (SMC) algorithms. Simulations in both simple-versus-simple and composite settings are presented, as well as an application on microarray data.

Session 81: Historical Insight on Statisticians Role in the Pharmaceutical Development The Changing Role of Biostatistics in Drug Development Judith D. Goldberg New York University School of Medicine [email protected] This talk discusses the roles of biostatistics in the pharmaceutical drug development process. In particular, the focus will be on the evolving roles and opportunities for statisticians in a rapidly changICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts ing environment that emphasizes efficiency and speed. The opportunities arise in a variety of contexts that include the use of high throughput technologies for compound screening; incorporation of evolving technologies such as genomics into the development process from preclinical through clinical development; and innovations in clinical trial study designs and analysis methods. Statisticians also have a unique viewpoint that can provide insights and innovations into in the strategic planning of drug development programs. The perspective of this talk is personal from my experience as head of statistics groups in the pharmaceutical industry and in academia. Impact of Statistical Thinking in the Regulatory Review of Drug Development Fanhui Kong U.S. Food and Drug Administration [email protected] There are many difficult issues in the design, conduct and analysis of the clinical trials of drug development. The major role of a statistician is to understand the issues, formulate the problems, and give reasonable solutions. By correctly formulating the problem and applying statistical methods, statisticians can make important impact in the drug development. With real examples I have encountered during my review of clinical trials in various therapeutic areas such as psychiatric and oncology, I will show how independent thinking may lead to new observations of the difficult issues, reasonable assumptions may help to formulate the problem, correct evaluation of the solutions may provide more reasonable answers, and meaningful communications with physicians may be helpful in both defining problems and interpretation of solutions. Asking Foolish Questions—A Statistician’s Role in Drug Research David Salsburg Salsburg Statistical Consulting [email protected] As the first statistician hired by Pfizer Central Research, I had the opportunity to poke my fingers into almost every aspect of drug research and development and even found myself consulting for other branches of the company. Being in on the planning usually meant that I was also expected to analyze the data that emerged. To protect myself from the unanalyzable, I often had to ask foolish questions, like “Can you really measure this?” Since I was not an expert in pharmacology, or obstetrics, or cardiology, or toxicology, it did not deduct from my aura of “expertese” to ask such questions. This talk will discuss the effects of such questions.

Session 82: Handling Heaping Accounting for Heaping in Retrospectively Reported Event Data - A Mixture-Model Approach � Haim Y. Bar and Dean R. Lillard Cornell University [email protected] When event data are retrospectively reported, more temporally distal events tend to get “heaped“ on even multiples of reporting units. Heaping may introduce a type of attenuation bias because it causes researchers to mismatch time-varying right-hand side variables. We develop a model-based approach to estimate the extent of heaping in the data, and how it affects regression parameter estimates. We use smoking cessation data as a motivating example, but our method is general. It facilitates the use of retrospective data from the multiICSA Applied Statistics Symposium 2011, NYC, June 26-29

tude of cross-sectional and longitudinal studies worldwide that collect and potentially could collect event data. Modeling Heaping in Self-Reported Longitudinal Cigarette Count Data � Hao Wang1 and Daniel F. Heitjan2 1 The Johns Hopkins University 2 University of Pennsylvania [email protected] Studies of smoking behavior commonly use the time-line follow back (TLFB) method, or periodic retrospective recall, to gather data on daily cigarette consumption. TLFB is considered adequate for identifying periods of abstinence and lapse but not for measurement of daily cigarette consumption, thanks to substantial recall and digit preference biases. With the development of hand-held electronic diaries (EDs), it has become possible to collect cigarette consumption data using ecological momentary assessment (EMA), or the instantaneous recording of each cigarette as it is smoked. EMA data, because they do not rely on retrospective recall, are thought to be far superior to TLFB data. In this article we present an analysis of cigarette consumption data collected simultaneously by both methods from 236 active smokers in the pre-quit phase of a smoking cessation study. We define a statistical model that describes the genesis of the TLFB records as a two-stage process of mis-remembering and rounding, including fixed and random effects at each stage. We use Bayesian methods to estimate the model, and we evaluate its adequacy by studying histograms of imputed values of the latent remembered cigarette count. Our analysis suggests that both misremembering and heaping contribute substantially to the distortion of self-reported cigarette counts. The model is potentially useful in other applications where it is desirable to understand the process by which true observations are remembered and reported. Non-Parametric Estimation of the Reporting Mechanism from Precise and Heaped Self-Report Data � Sandra D. Griffith1 , Saul Shiffman2 and Daniel F. Heitjan1 1 University of Pennsylvania 2 University of Pittsburgh [email protected] Open-ended numerical measures, often used in self-report to assess quantities or frequencies, are subject to a form of measurement error termed heaping. Heaping occurs when quantities are reported with varying levels of precision. Digit preference is a special case of heaping where the preferred values are round numbers. Daily cigarette counts, for example, commonly exhibit heaps at multiples of 20, and to a lesser extent, 2, 5, and 10, when measured by retrospective recall methods. Because heaping can introduce substantial bias to estimates, conclusions drawn from data subject to heaping are suspect. Several methods have been proposed to estimate the true underlying distribution from heaped data, but all depend to a considerable extent on unverifiable assumptions about the heaping mechanism. We are in possession of a data set in which subjects reported cigarette consumption by both a precise method (ecological momentary assessment as implemented with a hand-held electronic device) and a more traditional, imprecise method (time-line followback, or periodic retrospective recall). We propose a nonparametric method to estimate the conditional distribution of the heaping mechanism given the precise measurement. We measure uncertainty in the heaping mechanism with a bootstrap approach. Application to our data suggests that recall errors are a more important source of bias than actual heaping. Model-Based Analysis of Heaped Longitudinal Cigarette Count

97

Abstracts Data in Smoking Cessation Trials Sandra Griffith1 , � Daniel F. Heitjan1 , Yimei Li2 , Hao Wang3 and E. Paul Wileyto1 1 University of Pennsylvania 2 The Children’s Hospital of Philadelphia 3 The Johns Hopkins University [email protected] Raw time-line follow-back (TLFB) data from smoking cessation trials consist of retrospectively elicited daily counts of cigarettes smoked. Because complete abstinence is the goal, these counts commonly have an over-abundance of zeros, and are best represented as a zero-inflated (ZI) Poisson or negative binomial model. A significant nuisance with self-reported cigarette consumption data is heaping, or the tendency of smokers to report daily counts rounded to multiples of 5, 10 and 20. Heaping can bias the analyses in longitudinal count models. Over the past several years we have been working on ways to model TLFB data given true cigarette consumption as measured by ecological momentary assessment (EMA), or recording of the cigarettes instantaneously as they are smoked. We are in possession of a dataset where cigarette counts were measured by both EMA (electronic diaries) and TLFB (periodic recall). In this project we will use these data to create a model to impute accurate cigarette counts from TLFB data from a smoking-cessation clinical trial. With the multiply imputed true counts, we will fit longitudinal ZI Poisson and negative binomial models to the clinical trial data. This will offer a more accurate and efficient analysis, addressing issues such as whether treatment has effects on the number of cigarettes smoked, the time dependence of treatment effects, and the effects of treatment on mean consumption in non-quitters.

Session 83: Panel Session II: Industry-Academia Partnership: Successful Stories and Opportunities Panel Discussion Kenneth Koury1 , Nan Laird2 , Marcia Levenstein3 and Raymond Bain1 1 Merck Research Laboratories 2 Harvard University 3 Pfizer Inc. ivan [email protected] Effective partnerships between pharmaceutical companies and academic institutions provide a mechanism for capitalizing on the complementary nature of the roles of industry and academia in research related to pharmaceutical product development. The need to accurately assess the health effects of new and existing therapies represents a broad area of overlap between the pharmaceutical industry and academic departments of statistics or biostatistics that are committed to improving public health. These partnerships can facilitate the exploration of mutual research interests and the generation of new research projects which affect the practice of quantitative methods in the pharmaceutical industry. In this session, the speakers will share experiences from two successful partnerships that received the SPAIG (Statistical Partnerships among Academe, Industry, and Government) Award from the American Statistical Association — the Harvard/Merck (formerly Schering-Plough) Partnership and the Rutgers/Pfizer Partnership. In addition, they will discuss the partnership opportunities in other countries, particularly the industryacademic collaborations in China. This session will also include a open discussion to address questions and comments from the audience. Panelists:

98

• Nan Laird, Harvard University; [email protected] • Raymond Bain, Merck Research Laboratories; Raymond [email protected] • Marcia Levenstein, Pfizer Inc.; [email protected] • Kenneth Koury, Merck Research Laboratories; [email protected]

Session 84: Statistical Methods for Disease Genetics and Genomics Statistical Association Tests of Rare Variants Wei Pan University of Minnesota [email protected] In anticipation of the availability of next-generation sequencing data, there has been increasing interest in association analysis of rare variants (RVs). Due to the extremely low frequency of a RV, single variant based analysis and many existing tests developed for common variants (CVs) may not be suitable. Hence, it is of interest to develop powerful statistical tests to assess association between complex traits and RVs with sequence data. There are quite some new statistical challenges associated with analysis of RVs. In this talk we will present some new statistical tests to detect disease association with RVs. Genotype Imputation in Whole Genome Shotgun Sequencing Data Using Haplotype Information of Reads � Kui Zhang, Degui Zhi, Nianjun Liu and Jihua Wu Department of Biostatistics, University of Alabama, Birmingham [email protected] The high-throughput shotgun sequencing technologies play increasingly important roles to identify genetic variants for complex human diseases. Such technologies are different from the array-based genotyping technologies, in which the counts of the reference base and/or an alternative base at each site are observed other than the underlying genotype. In addition, some regions may not contain counts data because they are poorly covered due to chance. It is important to accurately impute unobserved genotypes based on observed and missing counts data. Li et al. (2010) developed a Hidden Markov Model (HMM) to impute unobserved genotypes based on the counts data at each site. However, with the rapid development of whole genome shotgun sequencing technologies, new versions of these technologies can generate longer reads that may cover two or more consecutive sites. Such long reads provide haplotype information on consecutive sites thus can be used to improve the accuracy of genotyping imputation. We extend the HMM model of Li et al. (2010) to incorporate such haplotype information from reads that cover two or more consecutive sites. Our simulations show that other method outperforms the original method. Statistical Methods for Testing CNV Association Hongzhe Li University of Pennsylvania [email protected] Copy number variants (CNVs) in gerline DNA are shown to be associated with many complex diseases. We discuss several statistical methods for detecting the CNVs that are associated with disease risk. Our methods do not require detection of the CNVs in individual samples and are shown to be more powerful than the commonly ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts used two-stage methods. We present some theoretical results on approximating the p-values for testing association in the framework of measurement error models. Simulations and analysis of a neuroblastoma data set are used to illustrate the methods. Graph-Based Bayesian Interaction Mapping in Disease Association Studies Yu Zhang The Pennsylvania State University [email protected] Genome-wide association studies are becoming increasingly important given the advance in high-throughput genotyping and sequencing technologies. In addition to detecting marginal associations of individual markers, it is also of interests to identify multi-marker associations and gene-gene interactions. Mapping from an astronomical number of possible interactions in the genome-scale is a challenging both computationally and statistically. For high-density markers, it is further critical to account for marker dependence so as to improve the mapping resolution and reduce the analytical complexity. We introduce a novel graph-based Bayesian model for large-scale multi-locus association mapping. Compared with existing algorithms and our previous BEAM models, the new method has two major improvements. 1) We construct disease interaction graphs to identify multiple complex gene-gene interactions. Compared with saturated interaction models, graphs offer greater flexibility that can increase the power of detecting interactions. The inferred graphs also provide detailed structures between disease associated markers. 2) We design probabilistic models to account for the complex dependence between high-density markers. Without accounting for marker dependence, hundreds of similar interactions due to correlation with the same disease mutations will be detected, which will substantially increase the computational and analytical burden without providing additional insights to the disease. We will use simulation and real data to demonstrate the performance of the new method compared with existing approaches. Our method can be further adapted to large-scale regulatory data sets measured at the individual-level, for which interaction mapping will also be very interesting and informative.

Session 85: Enhancing Clinical Development Efficiency with Adaptive Decision Making Promising Zone Designs for Oncology Trials Cyrus Mehta Cytel Inc. [email protected] Phase 3 trials of metastatic disease, where overall survival is the primary endpoint, are difficult to launch because they usually require a huge up-front sample size commitment. We present an adaptive approach in which the study starts out with a modest sample size that can be expanded later, based on interim data from the study itself. The method is illustrated with an ongoing trial of acute myeloid leukemia.This design satisfies the statistical and operational requirements of the FDA Draft Guidance on Adaptive Design. Optimal GNG Decision Rules for an Adaptive Seamless Phase II/III Oncology Trial � Cong Chen and Linda Sun Merck & Co., Inc. cong [email protected] In this presentation, we’d like to address the following issues: 1) how to effectively incorporate surrogate biomarker data into the deICSA Applied Statistics Symposium 2011, NYC, June 26-29

cision matrix; 2) how to derive cost-effective Go-No Go (GNG) bars for transition from Phase II and Phase III (and for this matter how to set cost-effective futility bars in Phase III); 3) how to fully realize the benefit of a seamless desgn in practice while keeping the risk under check. Type 1 and Type 2 Error Rates in Early Clinical Investigations Qing Liu and � Pilar Lim Johnson & Johnson [email protected] The 2-sided significance level 0.05 is often used for confirmatory phase 3 trials. It is not clear, however, what significance level to use for clinical trials in early clinical development programs. We consider a simple portfolio consisting of three clinical programs where the costs of phase 2a, 2b and phase 3 trials, as well as considerations for attrition are given. We evaluate the portfolio against four different clinical development strategies, one of which employs a phase 2a/2b combination design. In particular, each development strategy is characterized by setting different type 1 and 2 error rates so that the cost and success probability can be evaluated. We propose a cost-effective ratio for ranking clinical development plans. We conclude that the strategy employing phase 2a/2b design is most cost-effective. A Framework for Joint Modeling and Assessment of Efficacy and Safety Data for Probability of Success Evaluation and Optimal Dose Selection � Weili He1 , Xiting Cao2 and Lu Xu3 1 Merck & Co., Inc. 2 University of Minnesota 3 GlaxoSmithKline Oncology R&D weili [email protected] Evaluation of clinical proof of concept (POC), optimal dose selection, and phase III probability of success (POS) has traditionally been conducted by subjective and qualitative assessment of efficacy and safety data. This, in part, was responsible for the numerous failed phase III programs in the past. The need to utilize more quantitative approaches to assess efficacy and safety profiles has never been greater. In this presentation, we propose a framework that incorporates efficacy and safety data simultaneously for joint evaluation of clinical POC, optimal dose selection, and Phase III POS. Simulation studies were conducted to evaluate the properties of our proposed methods. The proposed approach was applied to two case studies. Based on the true outcome of the two case studies, the assessment based on our proposed approach suggested reasonable path forward for both clinical programs.

Session 86: Multivariate and Subgroup Analysis Methods for Dimension Reduction Efstathia Bura George Washington University [email protected] Sufficient Dimension Reduction (SDR) in regression comprises of estimation of the dimension of the smallest (central) dimension reduction subspace and its basis elements. For SDR methods based on a kernel matrix, such as SIR and SAVE, the dimension estimation is equivalent to the estimation of the rank of a random matrix which is the sample based estimate of the kernel. A test for the rank of a random matrix amounts to testing how many of its eigenor singular values are equal to zero. We propose two tests based on the smallest eigen- or singular values of the estimated matrix:

99

Abstracts an asymptotic weighted chi-square test and a Wald-type asymptotic chi-square test. We also provide an asymptotic chi-square test for assessing whether elements of the left singular vectors of the random matrix are zero. These methods together constitute a unified approach for all SDR methods based on a kernel matrix that covers estimation of the central subspace and its dimension, as well as assessment of variable contribution to the lower dimensional predictor projections with variable selection a special case. Bootstrap Methods for Assessing the Reliability of Subgroup Discovery in Clinical Trial Data � Javier Cabrera1 , Jiabin Wang1 and Ha Nguyen2 1 Rutgers University 2 Pfizer, Inc. Subgroup discovery for clinical trial data aims at characterizing groups of patients who benefit the most from a drug compared to older or competitive drugs or placebo. Standard data mining techniques such as CART, C4.5, Bump Hunting have been applied for this purpose but the criteria that is optimize does not address directly our objective and the subgroups that are found may produce treatment effects that are overly optimistic both in effect size and in p-value. For these reasons many of these findings do not replicate in future studies. In this paper we propose (i) a method for subgroup discovery that does directly optimize the relevant criteria and (ii) a bootstrap based strategy for assigning realistic p-values to the subgroups that are selected. We will illustrate our method by analyzing a group of clinical studies and characterizing the patients who respond better to one drug over competing treatments. Log-Rank-Type Tests for Equality of Distributions in HighDimensional Spaces � Xiaoru Wu, Zhiliang Ying and Tian Zheng Columbia University [email protected] Motivated by applications in high-dimensional settings, we propose a novel approach to testing equality of two or more populations by constructing a class of intensity centered score processes. The resulting tests are analogous in spirit to the well-known class of weighted log-rank statistics that is widely used in survival analysis. The test statistics are nonparametric, computationally simple and applicable to high-dimensional data. We establish the usual large sample properties by showing that the underlying log-rank score process converges weakly to a Gaussian random field with zero mean under the null hypothesis and with a drift under contiguous alternatives. For the Kolmogorov-Smirnov-type and the von Mises-type statistics, we also establish consistency result for any fixed alternative. As a practical means to obtain approximate cutoff points for the test statistics, a simulation based resampling method is proposed, with theoretical justification given by establishing weak convergence for the randomly weighted log-rank score process. The new approach is applied to a study of brain activation measured by functional magnetic resonance imaging when performing two linguistic tasks and also to a prostate cancer DNA microarray data set. Statistical Cluster Detection and Pervasive Surveillance of Nuclear Materials Using Mobile Sensors Jerry Cheng1 , � Minge Xie2 , Rong Chen2 and Fred Roberts2 1 Columbia University 2 Rutgers University [email protected] This talk outlines a robust system of a mobile sensor network and develops a spatial statistical model and algorithm to provide consistent and pervasive surveillance for nuclear or biological materials

100

in major cities. Specifically, we propose a design of a mobile sensor network, in which nuclear sensors and Global Position System (GPS) tracking devices are installed on a large number of vehicles such as taxicabs and police vehicles. Real time information from this network is processed at a central surveillance center, where mathematical and statistical analyses are performed. A latent spatial statistical model and formal statistical inferences for detecting multiple spatial clusters are developed, along with an EM/MCMC algorithm. Simulation studies are used to demonstrate the utility and effectiveness of such development.

Session 87: Statistical Issues Arised from Clinical Research Comparing Paired Biomarkers in Predicting Health Outcomes Xinhua Liu and Zhezhen Jin Columbia University [email protected] For two biomarkers that were associated with a common health outcome, it is important to examine whether or not one biomarker has greater discrimination accuracy than the other. When a pair of biomarkers is measured from the same study subjects, they are likely to be correlated. To compare the paired biomarkers in predicting the same health outcome, we adopted a non-parametric test for item selection using criteria related to discrimination accuracy. The statistical test performed well in the simulation study, where the data were generated with and without random censoring on the response variable. The test was then applied to two studies. In the first study investigating the effect of chronic exposure to arsenic and manganese on children’s intellectual function, we compared the discrimination accuracy between blood arsenic and blood manganese. The test did not detect any statistically significant difference between the pair of biomarkers in predicting the health outcomes of intellectual function test scores. In the second study examining baseline predictors of survival time of patients with primary biliary cirrhosis, the health outcome of interest was time to death subject to random censoring. We compared the prognostic factors of baseline serum albumin and bilirubin. The test result suggested that the baseline serum bilirubin have significantly greater discrimination accuracy in predicting survival time than the baseline serum albumin. �

Regression with Latent Variables (MIMIC Models): A Better Way to Analyze Composite Scores from Instruments in Clinical Trials and Medical Research � Chengwu Yang1 , Barbara C. Tilley2 , Anbesaw W. Selassie3 and Ruth L. Greene4 1 The Pennsylvania State University College of Medicine 2 The University of Texas Health Science Center at Houston 3 Medical University of South Carolina 4 Johnson C. Smith University [email protected] Subjective outcomes (e.g., Quality of Life) are prolifically used in clinical trials and medical research, and they are measured by composite scores from instruments (e.g, SF-36). The instruments’ validity is a big concern, and an important validity issues is if the instrument’s factor structure is sustained. The conventional multivariate regression that directly regresses the composite scores on important covariates cannot investigate the factor structure, and therefore the results can be wrong. The multiple-indicator multiple-causes (MIMIC) model that regresses the latent domain scores on the same important covariates has these advantages: 1) it can assess factor ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts structure of an instrument; 2).it can investigate if any covariate effect on the composite scores is contaminated by measurement bias, i.e., differential item functioning (DIF). Three situations exist when applying MIMIC models to composite scores: 1) the instrument’s factor structure sustained and there is no DIF; 2) the instrument’s factor structure sustained but there is DIF; 3) the instrument’s factor structure didn’t sustain. Appropriate analyzes strategies corresponding to these three conditions will be discussed. Analyzing the Influence of Local Failure on Distant Recurrence in Breast Carcinoma Wei-Ting Hwang University of Pennsylvania [email protected] Many clinical questions in the studies of radiation or adjuvant therapy for oncology would require examining the interrelationships of local and distant failure after the primary treatment. The necessary assumption on the independence of times to local and distant failure needed for the traditional actuarial methods is likely invalid. It is well know that patients who fail locally are more likely to fail distantly and this complicates the attempt to isolate the effects of a particular treatment or prognostic factor on local failure from its effects on distant failure. Several commonsense approaches on this topic were introduced including a multi-state model. A real example concerning failures patterns in early-staged breast cancer patients is presented. Genetic Statistical Approach for Disease Risk Prediction Tao Wang Albert Einstein College of Medicine [email protected] Integrating genetic, environmental and family information into a risk prediction model may be an essential step toward future personalized healthcare. Research has been conducted to evaluate the role of newly identified genetic loci for early disease prediction and found the risk prediction models have lacked sufficient accuracy for potential clinical use. One limitation of these prediction models is that they only considered a small number of loci, with significant, but often small, marginal effects. In this talk, we will discuss the statistical approaches considering a large number of genetic/environmental risk predictors for a more robust and accurate risk prediction model.

Session 88: Recent Developments in Modeling Data with Informative Cluster Size Semiparametric Regression Analysis of Clustered IntervalCensored Failure Time Data with Informative Cluster Size � Xinyan Zhang1 and Jianguo Sun2 1 Harvard School of Public Health 2 University of Missouri [email protected] We consider modeling clustered interval censored when cluster sizes may be informative by semiparametric methods. Specifically, a weighted estimating equation (WEE)-based procedure and a Within Cluster Resampling (WCR)-based procedures in the framework of semiparametric regression are presented. The techniques improve upon previously published methods by allowing for variable cluster sizes and heterogeneous correlation structures. We also compare the results to the the ones obtained by traditional unweighted estimating equation (UWEE) method without considering correlation structures within cluster. A simulation experiment ICSA Applied Statistics Symposium 2011, NYC, June 26-29

is carried out in order to study the performance the proposed approaches. The simulation study indicates that both approaches perform in general equally well in terms of point estimation, but the model-dependent approach yields confidence intervals with better coverage properties. Analysis of Recurrent Gap Time Data Using the Weighted Risk-Set Method and the Modified Within-Cluster Resampling Method Xianghua Luo1 and � Chiung-Yu Huang2 1

University of Minnesota

2

National Institute of Allergy and Infectious Diseases

[email protected] The gap times between recurrent events are often of primary interest in medical and epidemiology studies. The observed gap times cannot be naively treated as clustered survival data in analysis because of the sequential structure of recurrent events. We introduce two important building blocks, the averaged counting process and the averaged at-risk process, for the development of the weighted risk-set (WRS) estimation methods. We demonstrate that with the use of these two empirical processes, existing risk-set based methods for univariate survival time data can be easily extended to analyze recurrent gap times. Additionally, we propose a modified within-cluster resampling (MWCR) method that can be easily implemented in standard software. We show that the MWCR estimators are asymptotically equivalent to the WRS estimators. An analysis of hospitalization data from the Danish Psychiatric Central Register is presented to illustrate the proposed methods. A Joint Modeling Approach to Data with Informative Cluster Size: Robustness to the Cluster Size Model �

Zhen Chen, Bo Zhang and Paul Albert

National Institute of Child Health & Human Development [email protected] In many biomedical and epidemiological studies, data are often clustered due to longitudinal follow up or repeated sampling. While in some clustered data the cluster size is pre-determined, in others it may be correlated with the outcome of subunits, resulting in informative cluster size. When the cluster size is informative, standard statistical procedures that ignore cluster size may produce biased estimates. One attractive framework for modeling data with informative cluster size is the joint modeling approach in which a common set of random effects are shared by both the outcome and cluster size models. In addition to making distributional assumptions on the shared random effects, the joint modeling approach needs to specify the cluster size model. Questions arise as to whether the joint modeling approach is robust to misspecification of the cluster size model. In this paper, we studied both asymptotic and finite-sample characteristics of the maximum likelihood estimators in joint models when the cluster size model is misspecified. We found that using an incorrect distribution for the cluster size may induce small to moderate biases, while using a misspecified functional form for the shared random parameter in the cluster size model results in nearly unbiased estimation of outcome model parameters. We also found that there is little efficiency loss under this model misspecification. A developmental toxicity study was used to motivate the research and to demonstrate the findings.

101

Abstracts Session 89: New Developments in High Dimensional Variable Selection Model-Free Feature Screening for Ultrahigh Dimensional Data � Liping Zhu1 , Lexin Li2 , Runze Li3 and Lixing Zhu4 1 Shanghai University of Finance and Economics 2 North Carolina State University 3 The Pennsylvania State University 4 Hong Kong Baptist University [email protected] With the recent explosion of scientific data of unprecedented size and complexity, feature ranking and screening are playing an increasingly important role in many scientific studies. In this article, we propose a novel feature screening procedure under a unified model framework, which covers a wide variety of commonly used parametric and semiparametric models. The new method does not require imposing a specific model structure on regression functions, and thus is particularly appealing to ultrahigh-dimensional regressions, where there are a huge number of candidate predictors but little information about the actual model forms. We demonstrate that, with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. The new procedure is computationally efficient and simple, and exhibits a competent empirical performance in our intensive simulations and real data analysis. Non-Concave Penalized Composite Likelihood Estimation of Sparse Ising Models � Lingzhou Xue1 , Hui Zou1 and Tianxi Cai2 1 University of Minnesota 2 Harvard University [email protected] The Ising model is a useful tool for studying complex interactions within a system. The estimation of such a model, however, is rather challenging especially in the presence of high dimensional parameters. In this work, we propose efficient procedures for learning a sparse Ising model based on a penalized composite likelihood with non-concave penalties. Non-concave penalized likelihood estimation has received a lot of attention in recent years. However, such an approach is computationally prohibitive under high dimensional Ising models. To overcome such difficulties, we extend the methodology and theory of non-concave penalized likelihood to penalized composite likelihood estimation. An efficient solution path algorithm is devised by using a new coordinate-minorization-ascent algorithm. Asymptotic oracle properties of the proposed estimator are established with NP-dimensionality. We demonstrate its finite sample performance via simulation studies and further illustrate our proposal by studying the Human Immunodeficiency Virus type 1 (HIV-1) protease structure based on data from the Stanford HIV Drug Resistance Database. Model Selection Principles in Misspecified Models � Jinchi Lv1 and Jun S. Liu2 1 University of Southern California 2 Harvard University [email protected] Model selection is of fundamental importance to high-dimensional modeling featured in many contemporary applications. Classical principles of model selection include the Kullback-Leibler divergence principle and the Bayesian principle, which lead to the Akaike information criterion and Bayesian information criterion when models are correctly specified. Yet model misspecification is

102

unavoidable in practice. In this paper, we propose a family of semiBayesian principles for model selection in misspecified models that bridge the two well-known principles. We derive novel asymptotic expansions of the semi-Bayesian principles in misspecified generalized linear models, which give the new semi-Bayesian information criteria (extSIC gamma). A specific form of SIC admits a natural decomposition into the negative maximum quasi-log-likelihood, a penalty on model dimensionality, and a penalty on model misspecification directly. Numerical studies demonstrate the advantage of the SIC methodology for model selection in both correctly specified and misspecified models.

Session 90: Model Selection and Related Topics Adaptive Minimax Estimation with Sparse l-q Constraints Yuhong Yang University of Minnesota [email protected] For high-dimensional linear regression, both l 0 and l 1 norms on the coefficients have been used for sparse modeling of the regression function. In this work, we identify the minimax rates of convergence for regression estimation under l q constraints on the coefficients for 0 < q < 1 for both random and fixed designs. Furthermore, our estimators based on model combination/selection are showed to simultaneously achieve the optimal rates over the whole range of 0leqqleq1. Our results also permit model mis-specification. The work is joint with Zhan Wang, Sandra Paterlini and Fuchang Gao. Consistency of Community Detection for Networks under Degree-Corrected Block Models � Yunpeng Zhao, Liza Levina and Ji Zhu Department of Statistics, University of Michigan [email protected] Detecting communities within networks has attracted wide interest in several different fields including computer science, social science and biology. The stochastic block model provides a statistical approach to model networks with multiple communities. However, fitting the stochastic block model with real-world networks often yield poor community detection results, for it ignores the variation among nodes within a community. To resolve this issue, Karrer&Newman(2010) proposed a degree-corrected block model which allows flexibility among vertex degrees within the same community. In this presentation, we establish a general theory for checking asymptotic consistency of community labels estimated by any criterion under the assumption of the degree-corrected block model. As examples, we obtain consistency conditions for several commonly used criteria under both un-corrected and corrected block models. Statistical Analysis of Next-Generation Sequencing Data Wenxuan Zhong University of Illinois at Urbana-Champaign [email protected] Next-generation sequencing technologies sequence tens of millions DNA fragments in parallel. After these fragments, or short reads, are mapped to genome, diverse types of data can be derived. The essential feature of these data is that they have signals of multiple scales at genomic positions that may not be pre-specified. To accurately characterize the multiscale feature of the data, I will present a unified nonparametric method for modeling various types of nextgeneration sequencing data. The excellent empirical performance ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts of the proposed method is demonstrated through simulation studies and real data examples. Efficient Estimation and Variable Selection in VaryingCoefficient Partially Linear Models � Bo Kai1 , Runze Li2 and Hui Zou3 1 College of Charleston 2 The Pennsylvania State University 3 University of Minnesota [email protected] The complexity of semiparametric models poses new challenges to statistical inferences and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varyingcoefficient partially linear model. We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression (semi-CQR) procedure. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate. Moreover, we show that the semi-CQR method is much more efficient than the least-squares based method for many nonnormal errors and only loses a little efficiency for normal errors. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedures.

Session 91: Empirical Likelihood and Its Application Population Empirical Likelihood for Nonparametric Inference in Survey Sampling � Sixia Chen and Jae Kwang Kim Department of Statistics, Iowa State University [email protected] Empirical likelihood is a popular tool for incorporating auxiliary information and constructing nonparametric confidence intervals. In survey sampling, sample elements are often selected by using an unequal probability sampling method and the empirical likelihood function is modified to account for the unequal probability sampling. Wu and Rao (2006) proposed a way of constructing confidence intervals using the pseudo empirical likelihood of Chen and Sitter (1999) but the computation is somewhat cumbersome. In this paper, we propose a novel approach of empirical likelihood method in survey sampling using the so-called population empirical likelihood. In the population empirical likelihood approach, a single empirical likelihood is defined for the finite population. The sampling design is incorporated into the constraint in the optimization of the population empirical likelihood. The proposed method leads to optimal estimation and does not require artificial adjustment for constructing the likelihood ratio confidence intervals. Furthermore, because a single empirical likelihood is defined for the finite population, it naturally incorporates auxiliary information obtained from multiple surveys. Results from two simulation studies show the performance of the proposed method. Empirical Likelihood-Based Inferences for a Low Income Proportion Baoying Yang1 , � Gengsheng Qin2 and Jing Qin3 ICSA Applied Statistics Symposium 2011, NYC, June 26-29

1

Sichuan University and Georgia State University Georgia State University 3 National Institute of Allergy and Infectious Diseases [email protected] Low income proportion is an important index in comparisons ofpoverty in countries around the world. The stability of a society depends heavily on this index. An accurate and reliable estimation of this index plays an important role for government’s economic policies. In this paper, the authors study empirical likelihood-based inferences for a low income proportion under the simple random sampling and stratified random sampling designs. It is shown that the limiting distributions of the empirical likelihood ratios for the low income proportion are the scaled chi-square distributions. The authors propose various empirical likelihood-based confidence intervals for the low income proportion. Extensive simulation studies are conducted to evaluate the relative performance of the normal approximation-based interval, bootstrap-based intervals, and the empirical likelihood-based intervals. The proposed methods are also applied to analyzing a real economic survey income dataset. 2

Efficient Empirical Likelihood Inference in Partial Linear Models for Longitudinal Data � Suojin Wang1 and Lianfen Qian2 1 Texas A&M University 2 Florida Atlantic University [email protected] In analyzing longitudinal data, within-subject correlations are a major factor that affects statistical efficiency. Working with a partially linear regression model for longitudinal data, we consider a subject-wise empirical likelihood based method that takes the within-subject correlations into consideration to estimate the model parameters and the nonparametric baseline function. A nonparametric version of the Wilks theorem for the limiting distribution of the empirical likelihood ratio, which relies on a kernel regression smoothing method to properly center data, is derived. In addition, the proposed method is shown to be locally efficient among a class of within-subject variance-covariance matrices. A simulation study and an application are reported to investigate the finite sample properties of the proposed method and compare it with the block empirical likelihood method by Xue and Zhu (2007) and the normal approximation. These numerical results demonstrate the usefulness of the proposed method.

Session 92: Design and Analysis Issues in DNA methylation Preprocessing Illumina DNA Methylation BeadArrays � Kimberly Siegmund and Tim Triche Jr. University of Southern California [email protected] Variation in the epigenome, the distribution of DNA-related modifications and structural features that inform the packaging of the DNA, can confer a host of specialized functions to different cells with the same genome. DNA methylation is the most commonly studied epigenetic mark; its importance well-established in human development and disease. Presently, DNA methylation microarrays provide the most cost-effective means of high-throughput analysis. As with other types of microarrays that measure gene expression, genotype, or copy number variation, technical artifacts are a concern. For Affymetrix gene expression arrays, the robust multiarray analysis (RMA) algorithm has gained widespread acceptance

103

Abstracts for removing technical artifacts from the data. The main steps include correction for background fluorescence, normalization, data transformation and summarization. Illumina’s BeadArray technology for gene expression now has its own preferred data preprocessing pipeline, taking advantage of the hundreds of control probes for background correction and sample normalization. Preprocessing of DNA methylation microarrays has been described for some restriction-digestion based approaches, however, no corresponding body of work has been conducted for Illumina’s DNA methylation BeadArrays. We describe the Illumina BeadArray technology for DNA methylation analysis, and present a novel gamma-gamma convolution model to correct for bias due to background fluorescence. Using data generated on the HumanMethylation27K BeadArray, we find that the gamma-gamma convolution model reduces bias in signal intensity and reduces variation in probe signal across replicate samples better than competing approaches. Adaptations of the method for the recently launched HumanMethylation450K array will be discussed. Differential Inference of DNA Methylation Based on an Ensemble of Mixture Models � Shili Lin and Cenny Taslim The Ohio State University [email protected] Interest in epigenetics over the past few years has had a profound impact on many areas of genetic and genomic research. Several groundbreaking studies have provided strong evidence that a wide variety of human diseases, such as cancer, have an contributing epigenetic factor. Identification of biologically significant differential methylation between normal and cancer samples is an important issue in understanding the epigenetic signature of cancer. Methods for analyzing differential expression may be adapted for this purpose; there are also methods that were designed specifically for analyzing DNA methylation data. However, since methylation signature can be highly heterogeneous even for tumors of the same type, a single model is rarely able to provide a satisfactory fit to a wide variety of data. In this talk, I will discuss an ensemble approach, in which a collection of three classes of mixture models are considered. The ensemble approach has been implemented in an R-package called DIME. The algorithm effectively selects the model that provides the best fit to the data, which leads to statistical inferences with high sensitivity and specificity. In addition to methylation data, DIME is also applicable to other high throughput data given its ensemble nature. DNA Methylation Arrays as a Surrogate Measure of Cell Mixtures � E. Andres Houseman1 , William P. Accomando1 , Devin C. Koestler1 , Brock C. Christensen1 , Carmen J. Marsit1 , Karl T. Kelsey1 and John K. Wiencke2 1 Brown University 2 University of California, San Francisco E Andres [email protected] Increasingly, researchers are recognizing the incredible potential for methods of quantifying the composition of lymphocyte populations to critically inform the underlying immuno-biology of disease states, as well as the immune response to almost all chronic medical conditions. In addition, several recent studies have shown that DNA methylation measured in whole peripheral blood serves to distinguish cancer cases from controls. We present a method, similar to regression calibration, for inferring changes in the distribution of white blood cells between different subpopulations (e.g. cases

104

and controls) using DNA methylation signatures, assuming an external validation set consisting of methylation signatures from purified white blood cell samples exists. We demonstrate our method on Head and Neck Small Cell Carcinoma (HNSCC) cases and matched controls, showing that DNA methylation signatures register known changes in CD4+ and granulocyte populations in cancer cases compared with controls. Our statistical method, in combination with an appropriate external validation set, promises new opportunities for large-scale immunological studies. Method to Detect Differentially Methylated Loci with CaseControl Designs Shuang Wang Columbia University [email protected] It is now understood that virtually all human cancer types are the result of the accumulation of both genetic and epigenetic changes. DNA methylation is a molecular modification of DNA that is crucial for normal development. Genes that are rich in CpG dinucleotides are usually not methylated in normal tissues, but are frequently hypermethylated in cancer. With the advent of highthroughput platforms, large-scale structure of genomic methylation patterns is available through genome-wide scans and tremendous amount of DNA methylation data have been recently generated. However, sophisticated statistical methods to handle complex DNA methylation data are very limited. Here we developed a likelihood based Uniform-Normal-mixture model to select differentially methylated loci between case and control groups. The idea is to model the data as three types of methylation loci, one unmethylated, one completely methylated, and one partially methylated. A threecomponent mixture model with two Uniform distributions and one truncated normal distribution was used to model the three types. The mixture probabilities and the mean of the normal distribution were used to make inference about differentially methylated loci. Through extensive simulation studies, we demonstrated the feasibility and power of the proposed method. An application to a recently published study on ovarian cancer identified several methylation loci that are missed by the existing method.

Session 93: Recent Advances in Statistical Inference for Functional and Longitudinal Data Spatial Interpolation for Functional Data Tatiyana Apanasovich Thomas Jefferson University [email protected] Many applications from environmental science need to include the spatial relationships between functional variables into statistical analysis, e.g. temperature profiles obtained from weather stations. Methodologies for functional data with spatial dependence provide an opportunity to combine ideas from spatial statistics and functional data analysis as well as to develop new methods unique to such data types. In the current study, we express each function in terms of a linear combination of basis function (Ramsey and Silverman (1997)). We introduce the dependence between the coefficients using novel ideas from multivariate spatial statistics (Apanasovich and Genton (2010), Apanasovich at. el. (2011)). We discuss several approaches for predicting curves at unvisited sites, so called functional kriging. The methods are illustrated by predicting temperature profiles using data from 35 Canadian weather stations. ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts 1. Ramsay, J.O. and Silverman, B.W. (1997). Functional Data Analysis. New York: Springer. 2. Apanasovich, T. V. and Genton, M. G. 2010. Cross-covariance functions for multivariate random fields based on latent dimensions. Biometrika, 97, 15-30. 3. Apanasovich T.V, Genton M. C., and Sun Y. (2011). A Valid Matern Class of Cross-Covariance Functions for Multivariate Random Fields with any Number of Components. submitted Simultaneous Inference for the Mean Function of Dense Functional Data � Guanqun Cao, Lijian Yang and David Todem Michigan State University [email protected] Polynomial spline estimator is proposed for the mean function of dense functional data together with a simultaneous confidence band which is asymptotically correct. In addition, the spline estimator and its accompanying confidence band enjoy semiparametric efficiency in the sense that they are asymptotically the same as if all random trajectories are observed entirely and without errors. The confidence band is also extended to the difference of mean functions of two populations of functional data. Simulation experiments provide strong evidence that corroborates the asymptotic theory while computing is efficient. The confidence band procedure is illustrated by analyzing the near infrared spectroscopy data. Additive Modeling of Functional Gradients Hans-Georg Mueller1 and � Fang Yao2 1 University of California, Davis 2 University of Toronto [email protected] We consider the problem of estimating functional derivatives and gradients in the framework of a functional regression setting where one observes functional predictors and scalar responses. Derivatives are then defined as functional directional derivatives which indicate how changes in the predictor function in a specified functional direction are associated with corresponding changes in the scalar response. Aiming at a model-free approach, navigating the curse of dimension requires to impose suitable structural constraints. Accordingly, we develop functional derivative estimation within an additive regression framework. Here the additive components of functional derivatives correspond to derivatives of nonparametric onedimensional regression functions with the functional principal components of predictor processes as arguments. This approach requires nothing more than estimating derivatives of one-dimensional nonparametric regressions, and thus is computationally very straightforward to implement, while it also provides substantial flexibility, fast computation and asymptotic consistency. We demonstrate the estimation and interpretation of the resulting functional derivatives and functional gradient fields in a study of the dependence of lifetime fertility of flies on early life reproductive trajectories. A Confidence Corridor for Sparse Longitudinal Data Curves � Shuzhuan Zheng1 , Lijian Yang2 and Wolfgang K. Hardle3 1 Michigan State University 2 Michigan State University and Soochow University 3 Humboldt-Universitat zu Berlin and National Central University [email protected] Longitudinal data analysis is a central piece of statistics. The data are curves and they are observed at random locations. This makes the construction of a simultaneous confidence corridor (SCC) (confidence band) for the mean function a challenging task on both the theoretical and the practical side. Here we propose a method based on local linear smoothing that is implemented in the sparse (i.e., ICSA Applied Statistics Symposium 2011, NYC, June 26-29

low number of nonzero coefficients) modelling situation. An SCC is constructed based on recent results obtained in applied probability theory. The precision and performance is demonstrated in a spectrum of simulations and applied to growth curve data. Technically speaking, our paper intensively uses recent insights into extreme value theory that are also employed to construct a shoal of confidence intervals (SCI).

Session 94: Emerging Statistical Methods and Theories for Complex and Large Data Large Volatility Matrix Inference � Yazhen Wang1 and Jian Zou2 1 University of Wisconsin-Madison 2 National Institute of Statistical Sciences [email protected] High-frequency data observed on the prices of financial assets are commonly modeled by diffusion processes with micro-structure noise, and realized volatility based methods are often used to estimate integrated volatility. For problems involving with a large number of assets, the estimation objects we face are volatility matrices of large size. The existing volatility estimators work well for a small number of assets but perform poorly when the number of assets is very large. In fact, they are inconsistent when both the number, p, of the assets and the average sample size, n, of the price data on the p assets go to infinity. This talk will study large volatility matrix estimation and the dynamics of the large volatility matrices. I will describe asymptotic theory for the proposed approaches in the framework that allows both n and p to approach to infinity. Biological Pathway Selection through Nonlinear Dimension Reduction Hongjie Zhu and � Lexin Li North Carolina State University [email protected] In the analysis of high-throughput biological data, it is often believed that the biological units such as genes behave interactively by groups, i.e., pathways in our context. It is conceivable that utilization of priorly available pathway knowledge would greatly facilitate both interpretation and estimation in statistical analysis of such high-dimensional biological data. In this article, we propose a two-step procedure for the purpose of identifying pathways that are related to and influence the clinical phenotype. In the first step, a nonlinear dimension reduction method is proposed, which permits flexible within-pathway gene interactions as well as nonlinear pathway effects on the response. In the second step, a regularized modelbased pathway ranking and selection procedure is developed that is built upon the summary features extracted from the first step. Simulations suggest that the new method performs favorably compared to the existing solutions. An analysis of a glioblastoma microarray data finds four pathways that have evidence of support from the biological literature. We will also briefly talk about groupwise dimension reduction in general. Smooth Shrinkage Estimators for High-Dimensional Linear Models Lee Dicker Rutgers University [email protected] I will discuss the out-of-sample prediction error (predictive risk) associated with two classes of shrinkage estimators for the linear model: James-Stein type shrinkage estimators and ridge regression

105

Abstracts estimators. Our study is motivated by problems in high-dimensional data analysis and our results are especially relevant to settings where both the number of predictors and observations are large. Two important aspects of the proposed approach are (i) the data are assumed to be drawn from a multivariate normal distribution and (ii) we take advantage of an asymptotic framework that is appropriate for highdimensional data analysis and offers great simplifications over many existing approaches to studying shrinkage estimators for the linear model. This lays the groundwork for a detailed, yet transparent comparative analysis of the different estimators, which helps to shed light on their relative merits. For instance, we utilize results from random matrix theory to obtain explicit closed form expressions for the asymptotic predictive risk of the estimators considered herein (in fact, many of the relevant results are non-asymptotic). Additionally, we identify minimax ridge and James-Stein estimators, which outperform previously proposed shrinkage estimators, and prove that if the population predictor covariance is known—or if an operator norm-consistent estimator for the population predictor covariance is available—then the ridge estimator has smaller predictive risk than the James-Stein estimator. Bayesian Inference for Finite Population Quantiles from Unequal Probability Samples � Qixuan Chen1 , Michael R. Elliott2 and Roderick J.A. Little2 1 Department of Biostatistics, Columbia University 2 Department of Biostatistics, University of Michigan [email protected] Abstract: This paper develops two robust Bayesian model-based estimators of finite population quantiles for continuous survey variables in unequal probability sampling. The first method is to estimate cumulative distribution functions of the continuous survey variable by fitting a number of probit penalized spline regression models on the inclusion probabilities. The finite population quantiles are then obtained by inverting the estimated distribution function. This method is quite computationally demanding. The second method predicts non-sampled values assuming a smoothly-varying relationship between the continuous survey variable and the probability of inclusion, by modeling both the mean function and the variance function using splines. Simulation studies show that both methods yield smaller root mean squared errors than the sampleweighted estimator. When sample size is small, the 95% credible intervals of the two new methods have closer to the nominal level confidence coverage than the sample-weighted estimator.

Session 95: Complex Multivariate Outcomes in Biomedical Science Estimation of Piecewise Constant Function from 1D or 2D Correlated Signals in an fMRI Experiment Johan Lim1 and � Sang Han Lee2 1 Seoul National University 2 Nathan S. Kline Institute [email protected] In this work, we study penalized least square methods to estimate a pieacewise constant function from 1D or 2D signals and their applications to fMRI experiements. First, in functional magnetic resonance imaging (fMRI) experiments, the blood oxygenation leveldependent (BOLD) signal measured in response to input stimuli is temporally delayed and distorted due to various technical reasons. Here, reconstruction of the input stimulus function allows the fMRI experiment to be evaluated as a communication system. It further

106

provides insight into actual brain activity during task activation with less temporal blurring, and may be considered as a first step toward estimation of the true neuronal input function. The reconstruction problem can be formulated into the 1-dimensional piecewise function estimation problem. Second, in a task-based fMRI experiment, the activation of a voxel is measured by t (or F) statistics computed from the BOLD signals of each voxel. It is known that a brain region gets activated as a whole and introduces spatial dependence of t statistics among neighboring voxels (Woolrich et al., 2005). We are interested in finding a few clusters of voxels which are active. The problem is again formulated into the estimation of 2-dimensional piecewise constant function from statistical parametric maps (sets of t- or F-statistics) of subjects. In this work, we study two penalties, the complexity penalty (L0 penalty) and the lasso-type penalty (L1 penalty) to estimate piecewise constant functions from 1-D or 2-D signals, and compare their results. Lower-Dimensional Approximation for Functional Data with Its Application to Screening Young Children’s Growth Paths � Wenfei Zhang and Ying Wei Columbia University [email protected] Growth charts are commonly used for screening children’s growth. Current methods consider one measurement at a specific time. More informative screening can be achieved by studying the entire growth path. We propose a statistical method for screening growth paths by finding the lower-dimensional approximation of growth curves (functional data). The method is based on alternating regression, using B-splines to represent the growth curves. The growth charts using growth paths can be constructed by applying the multivariate quantile contours of the projection scores, which are obtained via approximation. The proposed method is applied to a Finnish growth data to monitor children’s growth during puberty. Penalized Cluster Analysis with Applications to Family Data � Yixin Fang1 and Junhui Wang2 1 New York University 2 University of Illinois at Chicago [email protected] The goal of cluster analysis is to assign observations into clusters so that observations in the same cluster are similar in some sense. Many clustering methods have been developed in the statistical literature, but these methods are inappropriate for clustering family data, which possess intrinsic familial structure. To incorporate the familial structure, we propose a form of penalized cluster analysis with a tuning parameter controlling the tradeoff between the observation dissimilarity and the familial structure. The tuning parameter is selected based on the concept of clustering stability. The effectiveness of the method is illustrated via simulations and an application to a family study of asthma. Massively Parallel Nonparametrics � Philip Reiss1 and Lei Huang2 1 Nathan Kline Institute, New York University 2 New York University [email protected] Scatterplot smoothing, and testing a polynomial null model against a smooth alternative, are well-studied problems in statistics. But high-dimensional data applications may present us with the new challenge of performing these inferential tasks many thousands of times in a computationally feasible manner. This talk will introduce novel algorithms for meeting this challenge: massively parallel tuning parameter selection for nonparametric regression, and massively ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts parallel restricted likelihood ratio tests for polynomial null hypotheses. The ideas are applied to a neuroimaging data set, in which the expectation of a quantity measured at each of tens of thousands of brain locations is modeled as a smooth function of age. Our analysis employs a novel functional-data clustering approach to illustrate how the estimated developmental trajectories vary across the brain.

Session 96: Application of Machine Learning Approaches in Biomedical Research Estimating Planned Sales Call Frequencies with Incomplete Information Using the EM Algorithm � Lan Ma Nygren and Lewis Coopersmith Rider University [email protected] We consider estimating planned sales call frequencies of a selling company with incomplete information caused by short recording durations in diary surveys. For practical reasons, it is necessary to keep the recording period short. Missing data occur when the recording period is not long enough to include observations with low call frequencies. We derive the maximum likelihood estimators of the multinomial cell probabilities for the planned sales call frequencies using the expectation maximization (EM) algorithm. We show that the EM algorithm estimators have good asymptotic properties in terms of both bias and mean squared error (MSE) and are more accurate and reliable than the estimators obtained by the na“ive approach of treating the absence of a sales call as a noncalled on respondent (i.e., zero frequency). The effect on the estimators when the number of frequency classes increases is also investigated. Multilevel Latent Class Analysis of Stages of Change for Multiple Health Behaviors � Luohua Jiang1 , Jannette Beals2 , Christina Mitchell2 , Spero Manson2 , Kelly Acton2 and Yvette Roubideaux3 1 Texas A&M Health Science Center 2 University of Colorado, Denver 3 Indian Health Service [email protected] The stages of change theory calls for stage tailored intervention for participants who are at different stages in terms of readiness to change their behaviors. Nowadays effective lifestyle intervention to prevent chronic diseases usually target on multiple behaviors at the same time. Looking at the stages of change for multiple behaviors at the same time may result in many different patterns in readiness to change across behaviors. We propose to use latent class analysis (LCA), a statistical method for finding subgroups of related cases (latent classes) from multivariate categorical data, to reduce data dimension and to summarize stages of change for multiple health behaviors among patients with pre-diabetes from 36 American Indian and Alaska Native (AI/AN) communities. Standard LCA assumes that observations are independent within each latent class. This assumption may not be valid for relatively complicated data structures, such as data collected from multi-site projects where participants clustered within each site are usually correlated. Recent developments of multilevel LCA have been made to perform LCA for clustered data. Using multilevel LCA, three classes were identified among participants of the Special Diabetes Program for Indians Diabetes Prevention Program based on the distribution of multiple stages of change variables. The relationship of these latent classes and both individual and site level characteristics were investigated ICSA Applied Statistics Symposium 2011, NYC, June 26-29

using multilevel latent class regression. Simulations shed light on the importance of accounting for the clustered data structure when analyzing this type of data, especially for the relationship between latent classes and site level characteristics. Outcome Weighted Learning for Selecting Individualized Treatment Regimens � Yingqi Zhao, Donglin Zeng and Michael Kosorok Department of Biostatistics, University of North Carolina at Chapel Hill [email protected] There is a growing interest in discovering individualized therapy for different patients due to heterogeneous responses to treatment. The effects of treatments are compared via the expected mean response or value function. A individualized treatment rule is a deterministic decision rule which maximizes this value function given patient characteristics. Considering a binary treatment, we transform the original question of interest into an equivalent classification problem. We aim at finding the best classifier based on a weighted combination of 0-1 loss, with weight given by clinial outcome. It turns out that the resulting optimal treatment rule is the Bayes rule for the corresponding classification problem. Although the desired loss function is non-convex and difficult to compute, we can develop a tractable estimation procedure by finding a convex surrogate loss that results in the same classifier. Specifically we apply a weighted support vector machine technique, and a finite sample bound is derived on the difference between the mean response of the estimated individualized treatment rule and that of the optimal rule. Simulations studies show the supreum of the proposed approach. Selecting a Target Patient Population Effectively for Desirable Treatment Benefits with the Data from a Randomized Comparative Study � Lihui Zhao1 , Lu Tian2 , Tianxi Cai1 , Brian Claggett1 and Lee-Jen Wei1 1 Harvard University 2 Stanford University [email protected] The conventional approach to comparing a new treatment with a standard therapy is often based on a summary measure for the treatment difference over the entire study population. However, a positive (negative) trial with respect to such a global measure does not mean that every future patient would (would not) benefit from the new treatment. In this research, we propose a systematic, two-stage estimation procedure for effectively selecting a target patient population with desirable treatment benefits using the data from a randomized comparative study. We first develop a scoring index to stratify study patients based on parametric or semiparametric working models for treatment differences. We then use a nonparametric method to estimate the average treatment difference for the “promising“ subgroups selected by the score via a cross validation procedure. The proposal can also be used for identifying a target patient population for a future comparative clinical trial. Our method is illustrated with two real examples. Kernel Machine Tests for Rare Genetic Variants in Sequencing Studies Michael C. Wu University of North Carolina at Chapel Hill [email protected] Although large scale genetic association studies have identify > 1000 of individual single nucleotide polymorphisms (SNPs) associated with a range of complex traits, identified SNPs explain only a

107

Abstracts fraction of the total heritability attributable to genetic factors. Rare variants, which can now be genotyped using high-throughput sequencing technology, can potentially explain some of the missing heritability. Region based testing, wherein multiple sequenced rare variants with a genomic region (broadly defined) are grouped and then their cumulative effect on the trait is tested, has become the standard strategy for rare variant testing. A wide range of statistical methods have been developed for testing rare variants with each test having optimal power under certain conditions. In practice however, it is unclear which method should be used since the optimal choice depends on the underlying genetic architecture which is unknown—knowledge of the architecture would actually preclude need for analysis. Tempting solutions, such as taking the minimum p-value across tests may lead inflated type I error and permutation can be slow and introduce difficulties with covariate adjustment. Therefore, we take a pragmatic approach to the problem and demonstrate that many commonly used tests are special cases of kernel machine regression test under different kernels. Then exploiting these connections, we develop a test that optimizes over a range of existing tests (kernels). Through simulations and real data analysis, we show that the proposed test has excellent power across a range of settings with great gains over poor choices of tests, but only modest loss in power compared to using the (unknown in practice) optimal test.

Session 97: Challenges in the Development of Regression Models Lack-of-Fit Testing of a Regression Model with Response Missing at Random Xiaoyu Li Michigan State University [email protected] This paper proposes a class of lack-of-fit tests for fitting a linear regression model when some response variables are missing at random. These tests are based on a class of minimum integrated square distances between a kernel type estimator of a regression function and the parametric regression function being fitted. These tests are shown to be consistent against a large class of fixed alternatives. The corresponding test statistics are shown to have asymptotic normal distributions under null hypothesis and a class of nonparametric local alternatives. Some simulation results are also presented. Statistical Modeling for Study on Health Behaviors Associated with Use of Body Building, Weight Loss, and Performance Enhancing Supplements � Tzu-Cheg Kao1 , Yi-Ting Tsai1 , Daniel Burnett2 , Mark Stephens3 and Patricia A. Deuster4 1 Division of Epidemiology and Biostatistics, PMB, Uniformed Services University 2 General Preventive Medicine Residency Program, PMB, Uniformed Services University 3 Department of Family Medicine, Utah State University 4 CHAMP, Department of Military and Emergency Medicine, Utah State University [email protected] Information on the use of potentially harmful dietary supplements (DS)—body building (BB), weight loss (WL), and performance enhancing (PE)—and their associated health behaviors is limited among military personnel. The 2005 Survey of Health-Related Behaviors Among Military Personnel, which used a stratified, two-

108

stage sampling design, collected data on DS use and health behaviors. We want to identify demographic and behavioral factors associated with use of BB, WL, and PE. Polytomous logistic regression models will be used to assess associations between behavioral patterns and BB, WL, and PE, adjusting for demographic variables. Statistical issues related to the modeling will be discussed. Scaled Sparse Linear Regression � Tingni Sun and Cun-Hui Zhang Rutgers University [email protected] Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual squares and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs nearly nothing beyond the computation of a path of the sparse regression estimator for penalty levels above a threshold. For the scaled Lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the method yields simultaneously an estimator for the noise level and an estimated coefficient vector in the Lasso path satisfying certain oracle inequalities for the estimation of the noise level, prediction, and the estimation of regression coefficients. These oracle inequalities provide sufficient conditions for the consistency and asymptotic normality of the estimator for the noise level, including cases where the number of variables is of greater order than the sample size. Numerical results demonstrate the superior performance of the proposed method over an earlier proposal of joint convex minimization. Generalized Linear Varying-Coefficient Model with Auxiliary Covariates � Jianwei Chen and Qian Xu San Diego State University [email protected] The generalized linear varying-coefficient model is a useful extension of the generalized linear model. The model structure allows the coefficient to be a curve function with different time. In this talk, we propose three nonparametric estimation methods for generalized varying-coefficient model when there are surrogate information about the covariate mismeasured. Our methods are efficient method to handle the missing or mismeasured covariate data problem in a widely-used exponential family of distributions. The asymptotic normal properties of the proposed estimators are established. Simulation study results show that the proposed imputation methods outperform the simple methods which are based on validation data only or naive method. Nonlinear Varying Coefficient Model and Its Applications � Esra Kurum1 , Runze Li1 , Damla Senturk1 and Yang Wang 2 1 The Pennsylvania State University 2 Freddie Mac [email protected] Motivated by an empirical analysis of a data set collected in ecology studies, we propose nonlinear varying coefficient models, where the relationship between the predictors and the response variable is allowed to be nonlinear. A local linear estimation procedure is developed for the nonlinear varying coefficient models and asymptotic normality of the proposed estimators is established leading to pointwise asymptotic confidence bands for the coefficient functions. We further propose a generalized F test to study whether the coefficient ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts functions vary over a covariate. We illustrate the proposed methodology via an application to an ecology data set and study the finite sample performance by Monte Carlo simulation studies.

Session 98: Recent Development in Measurement Error Models Regression-Assisted Deconvolution Julie McIntyre1 and � Leonard A. Stefanski2 1 Department of Mathematics and Statistics, University of Alaska, Fairbanks 2 Department of Statistics, North Carolina State University [email protected] Density deconvolution estimators typically rely only on assumptions about the distribution of the target variable X and the error in its measurement W and ignore information available in auxiliary variables. Yet often W is measured in the context of a large study and thus the data set containing W also contains several other variables, some of which are likely to be correlated with X. We describe a deconvolution method that assumes the availability of a covariate vector that is statistically related to X by a mean-variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Monte Carlo experiments show that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. The method is illustrated using anthropometric measurements of newborns to estimate the density function of newborn length. Nonlinear Models with Measurement Errors Subject to Singleindexed Distortion Jun Zhang1 , Lixing Zhu2 and � Hua Liang3 1 East China Normal University 2 Hong Kong Baptist University 3 University of Rochester [email protected] We study nonlinear regression models when both response and predictors are measured with errors and distorted as single-index models of some observable confounding variables, and propose a multicovariate-adjusted procedure. We first examine the relationship between the observed primary variables (observed response and observed predictors) and the confounding variables by appropriately estimating the index, and then develop a semiparametric profile nonlinear least square estimation procedure for parameter of interest after we calibrate the error-prone response and predictors. Asymptotic properties of the proposed estimators are established. To avoid estimating the asymptotic covariance matrix that contains the infinite-dimensional nuisance distorting functions and the single-index and to improve accuracy of the proposed estimation, we also propose an empirical likelihood based statistic, which is shown to have an asymptotic chi-square distribution. A simulation study is conducted to evaluate the performance of the proposed methods and a real dataset is analyzed for an illustration. Novel Methods for Misclassification Correction � John Staudenmayer1 and Meng-Shiou Shieh2 1 University of Massachusetts at Amherst 2 Baystate Medical Center [email protected] This talk proposes novel estimation and confidence interval construction methods for binary data that are subject to misclassificaICSA Applied Statistics Symposium 2011, NYC, June 26-29

tion. After a review of misclassification, reclassification, and existing correction methods, we develop two new estimation methods: a bias corrected estimator and a “partially corrected” estimator that takes advantage of the fact that sometimes the naive estimator has little or no bias and has a small variance. We show that the mean squared error of the partially corrected estimator is smaller than that of existing estimators. Fieller’s method and the multivariate delta method are used to create confidence intervals, and we show that Fieller’s method has some advantages. We compare our methods and existing methods with a simulation experiment, and we illustrate them on data from a recent study that measured physical activity with accelerometers. Quantile Regression with Measurement Errors � Ying Wei1 and Raymond Carroll2 1 Department of Biostatistics, Columbia University 2 Department of Statistics, Texas A&M University [email protected] Regression quantiles can be substantially biased when the covariates are measured with error. In this paper we propose a new method that produces consistent linear quantile estimation in the presence of covariate measurement error. The method corrects the measurement error induced bias by constructing joint estimating equations that simultaneously hold for all the quantile levels. An iterative EM-type estimation algorithm to obtain the solutions to such joint estimation equations is provided. The finite sample performance of the proposed method is investigated in a simulation study, and compared to the standard regression calibration approach. Finally, we apply our methodology to part of the National Collaborative Perinatal Project growth data, a longitudinal study with an unusual measurement error structure.

Session 99: Recent Developments in Time Series Banding the Sample Covariance Matrix of Stationary Processes � Mohsen Pourahmadi1 and Wei Biao Wu2 1 Texas A&M University 2 University of Chicago [email protected] We consider banding as a way of estimating covariance matrices of stationary processes. Under a short-range dependence condition for a wide class of nonlinear processes, it is shown that the banded covariance matrix estimates converge in operator norm to the true covariance matrix with explicit rates of convergence. Connections with the covariance estimation of high-dimensional data, spectral density estimation and order selection for AR(MA) models will be discussed. A sub-sampling approach is used to choose the optimal banding parameter, and simulation results reveal its satisfactory performance for linear and certain nonlinear processes. The procedure is solely based on the second-order characteristics of the underlying process. Selec- tion of the band parameter for long-memory and nonlinear processes remains open at this time. Modeling Dependence in a Network of Brain Signals Cristina Gorrostieta and � Hernando Ombao Brown University [email protected] In this talk, we shall discuss methods for characterizing dependence between brain regions. My own interest in this area stems from a growing body of evidence suggesting that various neurological disorders, including Alzheimer’s disease, depression, and Parkinson’s disease may be associated with altered brain connectivity.

109

Abstracts Dependence may be portrayed in a number of ways. This talk will be focused on measures that depict interactions in oscillations between brain regions. First, we shall discuss partial coherence which essentially identifies the frequency bands that drive direct linear association between regions. However, there are computational challenges to estimating this measure under high dimensionality. To overcome this problem, we developed a generalized shrinkage procedure which is a weighted average of a highly structured parametric estimator and a non-parametric estimator (based on mildly smoothed periodograms). Theoretical analysis and simulation studies demonstrate that the generalized shrinkage method has a lower mean-squared error than the standard approaches (Welch’s and multi-taper). Second, we develop more comprehensive measures of coherence that capture complex dependence structures in brain signals. The classical notion of coherence pertains only to contemporaneous single-frequency interactions between signals. To generalize this notion, we introduce the time-lagged dual-frequency coherence which measures, as a specific example, oscillatory interactions between alpha activity on a current time block at one channel and beta activity on a future time block at another channel. We develop formal methods for statistical inference under the framework of harmonizable processes. This new approach will be applied to analyze an electroencephalographic data set to investigate dependence between the visual, parietal and pre-motor cortices under the context of a visual-motor task. Gradient Based Cross Validation Method Daniel Henderson1 , � Qi Li2 and Chris Parmeter3 1 State University of New York at Binghamton 2 Texas A&M University 3 University of Miami [email protected] Data-driven bandwidth selection based on the gradient estimation of an unknown regression function is considered. Uncovering gradients nonparametrically is of crucial importance across a broad range of economic environments such as determining risk premium or recovering distribution of individual preferences. The procedure developed here is shown to deliver bandwidths which have the optimal rate of convergence for the estimation of gradients. We provide a detailed theoretical account of this new approach to smoothing parameter selection. An important additional advantage of our proposed method over the conventional cross-validation bandwidth selection method is that our approach overcomes the tendency of traditional data-driven approaches to engage in under smoothing. Both simulated and several empirical examples evidence showcase the finite sample attraction of this new mechanism. Inference for Non-Stationary Time Series � Zhibiao Zhao and Xiaoye Li The Pennsylvania State University [email protected] We study statistical inference for a class of non-stationary time series with time-dependent variances. Due to non-stationarity and the large number of unknown parameters, existing methods that are developed for stationary or locally stationary time series are not applicable. Based on a self-normalization technique, we address several inference problems, including self-normalized Central Limit Theorem, self-normalized cumulative sum test for change-point problem, long-run variance estimation through blockwise self-normalization, and self-normalization based wild bootstrap for non-stationary time series. Monte Carlo simulation studies show that the proposed self-

110

normalization based methods outperform stationarity based alternatives. We demonstrate the proposed methodology using two real data sets: annual mean precipitation rates in Seoul during 1771– 2000, and quarterly U.S. Gross National Product growth rates during 1947–2002.

Session 100: Challenging Topics in Longitudinal Data Some Aspects of Analyzing Longitudinal Data Using Functional Data Analysis Methods Naisyin Wang [email protected] In this talk, we will focus on regression analysis that links functional or longitudinal covariate processes to a scalar response. The main regression methodologies we consider are originated from traditional functional data analysis. We will discuss various issues that could play a role in the outcomes of analysis. Some key focus would be on the choices of basis functions, global or local oriented, based on the goals of the study and other issues such as model determinations trailing from the decision. We will discuss asymptotic properties of the estimators we propose. Numerical outcomes from simulation and data analysis studies are used to illustrate our findings. The talk contains results from joint works with R.J.Carroll, Y.Li, N.Y. Wang and J.H.Zhou. Marginal Methods for Correlated Binary Data with Misclassified Responses Zhijian Chen, Grace Y. Yi and � Changbao Wu University of Waterloo [email protected] Misclassification has been a long standing concern in medical research. Although there has been much research concerning errorprone covariates, relatively little work has been directed to problems on response variable subject to error. In this paper we focus on misclassification in clustered or longitudinal outcomes. We propose marginal analysis methods to handle binary responses which are subject to misclassification. The proposed methods have several appealing features, including simultaneous inference for both marginal mean and association parameters, and they can handle misclassified responses for a number of practical scenarios, such as the case with a validation subsample or replicates. Furthermore, the proposed methods are robust to model misspecification in a sense that no full distributional assumptions are required. Numerical studies demonstrate satisfactory performance of the proposed methods under a variety of settings. Modelling of Dose-Response-Time Data � Bjorn Bornkamp1 , Chyi-Hung Hsu2 , Jose Pinheiro2 and Frank Bretz1 1 Novartis Pharmaceuticals Corporation 2 Johnson & Johnson [email protected] Endpoints of interest in clinical trials are usually defined as improvement in an efficacy measure at a pre-specified timepoint after start of treatment. Typically data are available on the efficacy measure also for earlier timepoints, however, this information is usually not taken into account in the final analysis of the data. A reason for this might be that it is rather challenging to set up statistical models for dose-response-time data, because the the dependency of time and dose on the efficacy measure is typically nonlinear and highly indication-specific so that pre-specification of these ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts models (as required when writing the trial protocol) is difficult in practice. In this talk we present methods that allow to model dose-responsetime data, but which are robust enough to cover a broad variety of dose-response-time relationships a-priori. The presented methods will be compared in an extensive simulation study. In particular we will investigate to what extend one can increase estimation efficiency at the timepoint of interest by the use of longitudinal information.

right censoring survival data under the semi-parametric additive risk model frame with high-dimensional covariates. We also employ the time-dependent area under the receiver operating characteristic curve and root mean squared error for prediction to assess how well the model can predict the survival time. Furthermore, the proposed method is able to identify significant genes, which are significantly related to the disease. Finally, the proposed useful approach is illustrated by the diffuse large B-cell lymphoma data set and breast cancer data set. The results show that the model fits the data sets very well.

Session 101: High-Dimensional Models

A Beta-Mixture Model for Assessing Genetic Population Structure � Dipak K. Dey1 , Rongwei Fu2 and Kent Holsinger1 1 University of Connecticut 2 Oregon Health and Science University [email protected] vspace5mm Populations may become differentiated from one another as a result of genetic drift. The amounts and patterns of differentiation at neutral loci are determined by local population sizes, migration rates among populations, and mutation rates. We provide exact analytical expressions for the mean, variance and covariance of a stochastic model for hierarchically structured populations subject to migration, mutation, and drift. In addition to the expected correlation in allele frequencies among populations in the same geographical region, we demonstrate that there is a substantial correlation in allele frequencies among regions at the top level of the hierarchy. We propose a hierarchical Bayesian model for inference of Wright’s F -statistics. We illustrate the approach through an analysis of human microsatellite data, revealing that approaches ignoring the among population correlation of allele frequencies underestimate the amount of genetic differentiation among major geographical population groups by approximately 50%, and we discuss the implications of these results for the use and interpretation of F -statistics in evolutionary studies. We further provide exact expressions for the first two moments of a stochastic model appropriate for studying microsatellite evolution under the assumption that the range of allele sizes is bounded. Using these results we study the behavior of several measures related to Wright’s F ST , including Slatkin’s R ST . enddocument

Functional LARS for High Dimensional Additive Models Lifeng Wang Michigan State University [email protected] Least Angle Regression (LARS) proves to be an efficient algorithm for high-dimensional linear regression, particularly for the problem with pggn. However, its application to nonparametric additive models remains unclear. Motivated by its geometric interpretation with infinite samples, we propose a functional LARS algorithm to perform nonparametric regression and feature selection simultaneously for high-dimensional additive models. The proposed algorithm constructs the whole regularization solution path, which allows for adaptive tuning and efficient model selection. We investigate its connection to the Boosting and Lasso algorithms for additive models, and illustrate its performance via both simulated and real data. Consistent Model Selection by LASSO Yongli Zhang University of Oregon [email protected] For linear models, the model space size is an exponential function of the number of candidate predictors; so in high-dimensional regression, the exhaustive search is infeasible because of the huge computing workload. Alternatively, the stepwise model selection method LASSO offers a computationally feasible approach. However, LASSO is not able to identify the true model if the Irrepresentable Condition (Zhao and Yu, 2006), which requires that predictors in the true model cannot be highly correlated with the variables outside the true model, is not satisfied. Moreover, this condition is hard to check and often violated in high-dimensional data. In this article I develop a computationally feasible consistent model selection method by adding a certain amount of randomness to the feature matrix to weaken the correlation between predictors. This scheme offers a remedy when the Irrepresentable Condition is not satisfied. Both theoretical and simulation studies show that this method identifies the smallest true model with probability converging to one. Additive Risk Analysis of Gene Expression Data via Correlation Principal Component Regression � Yichuan Zhao and Guoshen Wang Georgia State University [email protected] In order to predict future patients’ survival time based on their microarray gene expression data, one interesting question is how to relate genes to survival outcomes. In this talk, by applying a semiparametric additive risk model in survival analysis, we propose a approach to conduct a careful analysis of gene expression data with the focus on the model’s predictive ability. In the proposed method, we apply the correlation principal component regression to deal with ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Session 102: Phase I/II Clinical Studies: Safety versus Efficacy Optimizing the Concentration and Bolus of a Drug Delivered by Continuous Infusion � Peter F. Thall1 , Aniko Szabo2 , Hoang Q. Nguyen1 , Catherine M. Amlie-Lefond2 and Osama O. Zaidat2 1 The University of Texas MD Anderson Cancer Center 2 Medical College of Wisconsin [email protected] We consider treatment regimes in which an agent is administered continuously at a specified concentration until either a therapeutic response is achieved or a predetermined maximum infusion time is reached. A portion of the planned maximum total amount of the agent is given as an initial bolus, with the possibility of achieving a response immediately. For such regimes, the amount of the agent received by the patient depends on the time to response, which in turn may affect the risk of toxicity. When response is evaluated periodically, response time is interval censored. We address the problem of

111

Abstracts designing a phase I/II clinical trial in which such response time data and a binary toxicity indicator are used together to jointly optimize the concentration and the size of the bolus. We propose a sequentially adaptive Bayesian design that chooses the optimal treatment for successive patients by maximizing the posterior mean utility of the joint efficacy-toxicity outcome. The methodology is illustrated by a trial of tissue plasminogen activator infused intra-arterially as rapid treatment for acute ischemic stroke. A Bayesian Adaptive Design for Multi-dose, Randomized, Placebo-controlled Phase I/II Trials Yuan Ji The University of Texas MD Anderson Cancer Center [email protected] A Bayesian adaptive design has been developed to simultaneously study the safety and efficacy of multiple doses of a regimen in a randomized placebo-controlled trial. Patient enrollment does not need to stop when transitioning from the evaluation of the dose safety and tolerability into the assessment of its efficacy. The cohort expansion for dose finding is adaptive based on the interim comparisons between each dose and the placebo. A set of Bayesian rules is constructed to guide the decisions on dose cohort expansion. Performance of this design has been evaluated by simulations to mimic the trial conduct and outcome in a variety of dose toxicity and efficacy scenarios. Compared to the conventional parallel group dosefinding strategy, our proposed adaptive design is better at removing ineffective doses, reducing the total sample size, and maintaining adequate power for dose finding. The proposed design has been implemented in an ongoing study. Continual Reassessment Method with Multiple Toxicity Constraints � Shing Lee, Bin Cheng and Ying Kuen Cheung Columbia University [email protected] In dose finding trials, toxicity is generally taken to be a binary outcome, dose limiting toxicity or not. In many settings, we are concerned with the gradations of severe toxicities that are considered dose limiting as well as the severity differences between toxicities types of the same grade. To differentiate the tolerance for different toxicity types and grades, we propose a novel extension of the continual reassessment method (CRM) that explicitly accounts for multiple toxicity constraints. The method has an explicit objective defined in terms of the probability of toxicity which makes it intuitive and in line with current practice. It uses a latent modeling approach and can be applied on a continuous or ordinal toxicity measure. We apply the proposed methods to redesign a bortezomib trial in lymphoma patients and compare their performance with that of the existing methods. Based on simulations, our proposed method achieves comparable accuracy in identifying the maximum tolerated dose but has better control of the erroneous allocation and recommendation of an overdose through the explicit constraint on highergrade toxicities.

Session 103: Network Analysis Multiscale Community Blockmodel for Network Exploration Eric Xing Carnegie Mellon University [email protected] Real world networks exhibit a complex set of phenomena such as underlying hierarchical organization, multiscale interaction, and

112

varying topologies of communities. Most existing methods do not adequately capture the intrinsic interplay among such phenomena. We propose a nonparametric Multiscale Community Blockmodel (MSCB) to model the generation of hierarchies in social communities, selective membership of actors to subsets of these communities, and the resultant networks due to within- and cross- community interactions. By using the nested Chinese Restaurant Process, our model automatically infers the hierarchy structure from the data. We develop a collapsed gibbs sampling algorithm for posterior inference, conduct extensive validation using synthetic networks, and demonstrate the utility of our model in real-world datasets such as predator-prey networks and citation networks. Estimating Latent Processes on a Graph from Indirect Measurements � Edoardo Airoldi and Alexander Blocker Harvard University [email protected] Structured measurements and populations/samples with interfering units are ubiquitous in science and have become a focal point for discussion in the past few years. Formal statistical models for the analysis of this type of data have emerged as a major topic of interest in diverse areas of study. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online social networking websites such as Facebook and LinkedIn, and a host of more specialized professional networking communities has intensified interest in the study of graphs, structured measurements and interference. In this talk, I will review a few ideas and open areas of research that are central to this burgeoning literature, placing emphasis on the statistical and data analysis perspectives. I will then focus on the the problem of making inference on latent processes on a graph, with an application to the estimating pointto-point traffic volumes in a communication network from indirect measurements. Inference in this setting requires solving a sequence of ill-posed inverse problems, y(t)= A x(t). We develop a multilevel state-space model for mixing times series and an efficient approach to inference; a simple model is used to calibrate regularization parameters that lead to efficient inference in the multilevel state-space model. Our two-stage approach suggests an efficient inference strategy for multilevel models of multivariate time series. Predicting Behavior with Social Networks � Sharad Goel and Daniel G. Goldstein Yahoo! Research [email protected] With the availability of social network data, it has become possible to relate the behavior of individuals and their acquaintances. While the similarity of connected individuals is well established, it is unclear if and how social data can be used to predict behavior, and whether such predictions are more accurate than those of standard models. We employ a communications network to forecast diverse behaviors from patronizing a department store to joining a recreational league. Across all the domains, we find three general results. First, for most individuals, social data do not appreciably improve predictions. Second, social data are highly informative in identifying those individuals most likely to undertake actions. Finally, in ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts identifying such individuals, social data generally improve the predictive accuracy of baseline models. Latent Space Models for Networks Using Aggregated Relational Data � Tyler McCormick1 and Tian Zheng2 1 Columbia University and University of Washington 2 Columbia University [email protected] Social networks have become an increasingly common framework for understanding and explaining social phenomena. But despite an abundance of sophisticated models, social network research has yet to realize its full potential, in part because of the difficulty of collecting social network data. In contrast, Aggregated Relational Data, commonly collected as questions of the form “How many X’s do you know¿‘, measure network relationships indirectly and are easily incorporated into standard surveys. We propose a latent space model where the propensity of an individual to know members of a given alter group (people named Michael, for example) is independent given the positions of the individual and the group in a latent “social space.“ This framework is similar in spirit to previous latent space models proposed for networks (Hoff , Raftery and Handcock (2002), for example) but doesn’t require that the entire network be observed. Using this framework, we derive evidence of social structure in personal acquaintances networks, estimate homogeneity of groups, and estimate individual and population gregariousness. Our method makes information about more complicated network structure available to the multitude of researchers who cannot practically or financially collect data from the entire network.

Session 104: Recent Advances in Genome-Wide Association Studies Prediction with Scores of Tiny Effects: Lessons from GenomeWide Association Studies � Nilanjan Chatterjee, Mitchell H. Gail and Ju-Hyun Park National Cancer Institute, DCEG/BB [email protected] Although recent genome-wide association studies have led to the identification of many susceptibility loci for a variety of complex traits, the utility of these discoveries for predicting individualized risk has been modest. This talk will examine the potential utility of future risk models that may include additional susceptibility loci as well as non-genetic risk factors. In particular, we will describe methods for estimating number of underlying susceptibility loci for a trait and the distribution of their effect-sizes using data from recent genome-wide association studies. We will then show how such estimates can be used to assess the limits of performance of future prediction models including high-dimensional polygenic models that may include hundreds or thousands of SNPs. We will point out some of the intrinsic theoretical challenges for the general problem of prediction with scores of tiny effects. Testing and Estimation in Genome-Wide Association Studies through Penalized Splines � Yuanjia Wang and Huaihou Chen Columbia University [email protected] We propose a generalized F test of an unspecified nonparametric genetic function with multilevel genetic data applicable to both cross-sectional and longitudinal phenotypes. The procedure can also be used to test variance components in linear mixed effects ICSA Applied Statistics Symposium 2011, NYC, June 26-29

model where there exist nuisance variance components under the null, to compare two spline functions, and to test a spline function in an additive model with multiple covariates. By a mixed effects model representation of penalized splines, we imbed the test of a genetic function into testing fixed effects and a variance component in a mixed effects model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm to compute null distribution of the test statistic which improves computational efficiency significantly comparing to bootstrap. The spectral representation reveals a connection between likelihood ratio test in multiple variance components model and single component model. We examine our methods through simulations and apply them to compute genome-wide critical value and p-value of a genetic association test in a genome-wide association study (GWAS) where the usual bootstrap is computationally prohibitive (up to 108 simulations) and asymptotic approximation may be unreliable and conservative. In addition to testing, we also propose robust nonparametric estimation of unspecified genetic effect function and examine their asymptotic properties. A Powerful Association Test of Ordinal Traits in Samples with Related Individuals � Zuoheng Wang and Chengqing Wu Yale University [email protected] Many traits in health studies, such as cancer and psychiatric disorders, are recorded on a discrete, ordinal scale. Here we develop a novel method for association analysis of ordinal traits when some sampled individuals are related, with known relationships. We propose quasi-likelihood score tests for genotypes conditional on ordinal phenotype data. The conditional model for genotypes is derived from a proportional odds model for phenotypes. The resulting test is valid even when the phenotype model is misspecified and with either random or phenotype-based ascertainment. We perform simulation studies to evaluate the power and robustness of the proposed approach. The method is applied to a genome-wide association study of alcoholism. We also discuss the extension of our method in the presence of population substructure. Incorporating Biological Pathways via a Markov Random Field Model in Genome-Wide Association Studies � Min Chen1 , Judy Cho2 and Hongyu Zhao2 1 The University of Texas Southwestern Medical Center at Dallas 2 Yale University [email protected] Genome-wide association studies (GWAS) examine a large number of markers across the genome to identify associations between genetic variants and disease. Most published studies examine only single markers, which may be less informative than considering multiple markers and multiple genes jointly because genes may interact with each other to affect disease risk. Much knowledge has been accumulated in the literature on biological pathways and interactions. It is conceivable that appropriate incorporation of such prior knowledge may improve the likelihood of making genuine discoveries. Although a number of methods have been developed recently to prioritize genes using prior biological knowledge, such as pathways, most methods treat genes in a specific pathway as an exchangeable set without considering the topological structure of a pathway. However, how genes are related with each other in a pathway may be very informative to identify association signals. To make use of the connectivity information among genes in a pathway in GWAS analysis, we propose a Markov Random Field (MRF)

113

Abstracts model to incorporate pathway topology for association analysis. We show that the conditional distribution of our MRF model takes on a simple logistic regression form, and propose an iterated conditional modes algorithm as well as a decision theoretic approach for statistical inference of each gene’s association with disease. Simulation studies show that our proposed framework is more effective to identify genes associated with disease than single gene based method. We also illustrate the usefulness of our approach through its applications to a real data example.

Session 105: Statistical Methodology and Regulatory Issues in Drug Development in Multiple Regions Strategies for Multi-Regional Clinical Developments—A Quantitative Evaluation � William Wang1 , Huimin Liao2 and William Malbecq1 1 Merck & Co., Inc. 2 Fudan University william [email protected] In recent years, there has been an increasing trend by global pharmaceutical companies to develop pharmaceutical products simultaneously in multiple regions. This presentation will highlight opportunities and challenges in Asia Pacific, and discuss some regionspecific statistical considerations such as sample size requirements, statistical significance and regional consistency. We will also evaluate two different types of strategies: those using bridging trials versus those using multi-regional global trials. Simulation studies are used to assess how regional factors (e.g., sample size, treatment effect and within- and across-region variability) may impact the performance of these strategies. Empirical Shrinkage Estimator for Consistency Assessment of Treatment Effects in Multi-Regional Clinical Trials � Hui Quan1 , Mingyu Li2 , Weichung Joe Shih3 , Soo Peter Ouyang2 , Joshua Chen4 , Ji Zhang1 and Peng-Liang Zhao1 1 Sanofi-Aventis U.S. Inc. 2 Celgene Corporation 3 University of Medicine and Dentistry of New Jersey 4 Merck & Co., Inc. [email protected] Multi-regional clinical trials (MRCTs) have been widely used for efficient global new drug developments. Both a fixed effect model and a random effect model can be used for trial design and data analysis of a MRCT. In this paper, we first compared these two models in terms of the required sample size, Type I error rate control and the interpretability of trial results. We then apply the empirical shrinkage estimation approach based on the random effect model to two criteria of consistency assessment of treatment effects across regions. As demonstrated in our computations, compared to the sample estimator, the shrinkage estimator of the treatment effect of an individual region borrowing information from the other regions is much closer to the estimator of the overall treatment effect, has smaller variability and therefore provides much higher power for consistency assessment. A multinational trial example with time to event endpoint is used to illustrate the application of the method.

Session 106: Genomic Biomarker Applications in Clinical Studies Challenges in Use of Genomic Biomarker Qualification When Transitioning between Drug Development Phases—Experience

114

from Cardiovascular, Oncology and Infectious Diseases Peggy Wong Merck & Co., Inc. peggy [email protected] As the cost of genomic technology decreases, there has been a corresponding increase in the number of studies attempting to leverage molecular profiling technology to assign treatments or dose. The results of these studies from literature, academic partnerships or internal studies can influence drug development real time. One challenge is to how to guide the clinical teams in incorporating genomic biomarkers from preclinical to Phase III drug development phases by providing some sort of quantitative assessment of “power” to assess feasibility of progressing. There are also different challenges in using the same clinical trials for continued discovery. Examples from cardiovascular, oncology and infectious diseases areas will be used to drive the discussion. In addition, the results of the statistical testing of the genotypic association will be used to discuss clinical trial setup for next steps. A New Multi-Gene Classification Method for the Prediction of Drug Response � Haisu Ma1 and Zhaoling Meng2 1 Yale University 2 Sanofi-Aventis U.S. Inc. [email protected] One of the key goals of pharmacogenetics is the utilization of genetic variation to elucidate the inter-individual variation in drug treatment response. The completion of the Human Genome Project and the associated HapMap Project has greatly advanced the potential to develop predictive genetic tests that could maximize drug efficacy while minimizing toxicity via patient classification. However, a multi-gene classification and corresponding sample size/power justification methods are lacking. This study has three aims. First, we conducted literature reviews on recent advances in classification methods. We did a comprehensive survey on existing classifiers, compared their advantages and disadvantages based on the characteristics of the SNP data used in our project, as well as a number of criteria for computational and predictive performance. Second, we compared three candidate classifiers, which are logistic regression (using different penalties including LASSO and elastic net), Bayesian network and a modified version of PAM (prediction analysis of microarrays) using a real dataset. Third, the existence of potential placebo responders in the drug treatment group makes it difficult to assess the drug effect. We proposed a new multi-gene classification method based on adaptive LASSO, which is specifically designed to tackle this issue. We compared the new method with forward step-wise logistic regression with AIC, using simulated SNP and drug response data. We also made recommendations for future projects about the use of different classification methods. An Integrative Genomics Paradigm for the Discovery of Novel Tumor Subtypes and Associated Cancer Genes � Ronglai Shen1 , Adam B. Olshen2 , Qianxing Mo1 and Sijian Wang3 1 Memorial Sloan-Kettering Cancer Center 2 University of California, San Francisco 3 University of Wisconsin-Madison [email protected] High resolution microarray and next-generation sequencing platforms are powerful tools to investigate genome-wide alterations in gene expression, DNA copy number, DNA methylation, and other genomic events associated with a disease. An integrated genomic profiling approach measuring these omic data types simultaneously in the same set of samples would further reveal disease mechanisms ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts that would not be otherwise detectable with a single data type. In this talk, I will present a joint data analysis approach for subtype discovery and associated biomarkers. We formulated a novel framework using a joint latent variable model for integrating omic data sets. Sparse solutions are derived using penalized maximum likelihood approach. I will discuss results from analyzing multidimensional data sets generated from the Cancer Genome Atlas (TCGA) project.

Session 107: Statistical Modeling and Application in Systems Biology Statistical Modeling of RNA-Seq Ping Ma University of Illinois at Urbana-Champaign [email protected] With the rapid development of next-generation sequencing technologies, RNA-Seq has become a popular method for genomewide gene expression analysis. Compared to its hybridizationbased counterpart, e.g., microarray, RNA-Seq offers up to a singlenucleotide resolution signals. In particular, RNA-Seq sequences tens of millions of DNA fragments in parallel. After mapping these fragments, also called short reads, to reference genome (or transcripts), researchers get a sequence of read counts. That is, at each nucleotide position, researchers get a count which stands for the number of reads whose mapping starts at that position. Accurate quantification of gene expression relies on these read counts. In thei talk, I will present some recent work on modeling RNA-Seq read counts. Global Patterns of RNA and DNA Sequence Differences in the Human Transcriptome Mingyao Li University of Pennsylvania School of Medicine [email protected] The transmission of information from DNA to RNA is a critical process. It is assumed that DNA is faithfully copied into RNA. However, when we compared RNA sequences from human B cells of 27 individuals to the corresponding DNA sequences from the same individuals, we uncovered more than 10,000 exonic sites where the RNA sequences do not match that of the DNA. Validations using RNA sequences from other laboratories and re-sequencing of the DNA and RNA samples confirmed these findings. All 12 possible categories of discordances were found, with A-to-G and C-to-U being the most common. About 44% of the differences involved conversions between purines and pyrimidines. These differences were non-random as many sites were found in multiple individuals. The same differences were also found in primary skin and brain cells in a separate set of individuals and in expressed sequence tags from different tissues. Using data from mass spectrometry, we also detected peptides that are translated from the RNA sequences rather than the DNA sequences of genes. Thus, these widespread RNADNA differences in the human transcriptome provide a yet unexplored aspect of genome variation that affect gene expression and therefore phenotypic and disease manifestations.

[email protected] We propose a systematic approach for better understanding how HIV viruses employ various combinations of mutations to resist drug treatments, which is critical to developing new drugs and optimizing the use of existing drugs. By probabilistically modeling mutations in the HIV-1 protease or reverse transcriptase (RT) isolated from drug-treated patients, we present a statistical procedure that first detects mutation combinations associated with drug resistance and then infers detailed interaction structures of these mutations. The molecular basis of our statistical predictions is further studied using molecular dynamics simulations and free energy calculations. We have demonstrated the usefulness of this procedure on three HIV drugs, (Indinavir, Zidovudine and Nevirapine), discovered novel interaction features between viral mutations induced by these drugs, and revealed the structural basis of such interactions. Phylogenetic Path to Event (PhyloPTE) Samuel Handelman, � Joseph Verducci, Daniel Janies and Jesse Kwiek The Ohio State University [email protected] Several methods have been proposed for correlating genomic sequence patterns directly with phenotypes of similar organisms. However, the evolutionary relationships between organisms lead to non-independence among the sequences. A phylogenetic tree reconstruction uncovers sibling lineages where the phenotypes first start to differentiate, and, conditional on this tree, PhyloPTE adopts an additive hazard model to identify likely mutational paths along the tree as the phenotypes evolve. For example, the HIV-1 virus has a population structure reflecting both transmission between individuals and evolution of the HIV-1 quasispecies within each patient. Non-independence can introduce spuriously strong correlation between unrelated mutations giving a false appearance of causation. These evolutionary relationships are an issue even in HIV-1 where recombination is rapid, and they are pervasive in humans, where linkage disequilibrium is extensive. In human disease studies, spurious correlation can sometimes be overcome by pedigree analysis or simple sibling studies: alleles common only in “sick” siblings are likely true causative alleles. PHYLOPTE’s advantages include: incorporating information about branch lengths to infer mutational rates; computational speed practical for highthroughput (next generation)sequence data; estimates of relative influence of different effects; and improved precision even versus other tree-based methods: 50%-300% improvement in precision at same recall, either to predict experimental correlations (obtained from STRING:http://string-db.org/) or in simulations under biologically reasonable parameters on HIV quasispecies sequence trees.

Session 108: Statistical Methodologies

Bayesian Inference of Interaction in HIV Drug Resistance � Jing Zhang1 , Tingjun Hou2 , Wei Wang3 and Jun Liu4 1 Yale University 2 Soochow University 3 University of California, San Diego 4 Harvard University

Some Drop-the-Loser Designs for Monitoring Multiple Doses � Joshua Chen1 , David DeMets2 and Gordon Lan3 1 Merck & Co., Inc. 2 University of Wisconsin-Madison 3 Johnson & Johnson joshua [email protected] One of the key objectives of a Phase II clinical program is to identify appropriate dose for subsequent confirmatory studies. Traditionally, one single dose is selected for a Phase III confirmatory study at the end of dose-ranging Phase IIB. It is not uncommon that at the end of Phase IIB, choices are narrowed down to 2-3 doses but the team cannot pick one single good dose which is highly likely to be efficacious

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

115

Abstracts and also to have a favorable safety profile. A few candidate doses enter the confirmatory study and ineffective and/or toxic doses compared to the control may be dropped at the interim analyses as the study continues (“drop-the-loser“). The study may be stopped once the accumulated data have demonstrated convincing efficacy and an acceptable safety profile for one or more doses. Several “drop-theloser“ designs and their characteristics will be discussed. Linear and Nonlinear Boundary Crossing Probabilities for Brownian Motion and Its Application in Predicting Bankruptcy James C. Fu University of Manitoba [email protected] We propose a new method to obtain the boundary crossing probabilities or the first passage time distribution for linear or non-linear boundaries for the Brownian motion. The method also covers certain classes of stochastic processes associated with Brownian motion. The basic idea of the method is based on being able to construct a finite Markov chain and the boundary crossing probability of Brownian motion is cast as the limiting probability of the finite Markov chain entering a set of absorbing states induced by the boundaries. Numerical results for various types of boundaries studied in the literature are provided in order to illustrate our method. A new method based on boundary crossing probability is proposed to predicting bankruptcy. A set of data containing 107 manufacture companies in U.S.A. has been used to verify the method. Statistical Properties of Parasite Density Estimators in Malaria and Field Applications � Imen Hammami1 , Andr`e Garcia2 and Gr`egory Nuel1 1 Applied Mathematics at Paris Descartes 2 Institut de Recherche pour le D`eveloppement [email protected] Malaria is a global health problem responsible for nearly 3 million deaths each year, an average of one person every 12s. In addition, 300 to 500 million people contract the disease each year, mostly in resource poor areas where malaria is endemic. The level of infection, expressed as the parasite density (PD), is classically defined as the number of asexual forms of Plasmodium falciparum relative to a microliter of blood. Microscopy of Giemsa-stained thick blood films is the gold standard for parasite enumeration in case of febrile episodes. PD estimation methods usually involve threshold values as the number of white blood cells (WBC) counted and the number high power fields (HPF) seen. However, the statistical properties of the PD estimates generated by these methods have been generally overlooked. Here, we discuss commonly used threshold-based counting techniques in the context of the Plasmodium falciparum PD estimation. We study the statistical properties (bias, variance, False-Positive Rates...) of the PD estimates according to varying threshold values. Moreover, we evaluate the cost-effectiveness of the estimation methods and we assess their reliability by studying the level of left-censoring; means diagnosing by mistake the PD as null. However, different threshold values may be fixed, which raises questions regarding accuracy and reproducibility. The question is to which extent the threshold values would influence the variability in PD estimates. To understand how the thresholds involved in parasite enumeration methods contribute to the magnitude of discrepancies in density determination, we study their impact in measuring differences in readings slide. Furthermore, we give more insights on the behavior of measurement errors according to varying threshold values and on what would be the optimal values that minimize the variability. Keywords: Threshold-based counting techniques, par-

116

asite density estimators, bias, variance, False-Positive Rates, costeffectiveness, malaria epidemiology. One-Step Weighted Composite Quantile Regression Estimation of DTARCH Models Jiancheng Jiang University of North Carolina at Charlotte [email protected] In modeling volatility in financial time series, the double-threshold autoregressive conditional heteroscedastic (DTARCH) model has been demonstrated as a useful variant of the autoregressive conditional heteroscedastic (ARCH) models. In this talk we propose a one-step weighted composite quantile regression method for estimating the DTARCH model. This method involves a sequence of weights and takes a data-driven weighting scheme to maximize the asymptotic efficiency of the estimators. Under regularity conditions, we establish asymptotic distributions of the proposed estimators. It is demonstrated that the proposed estimators are robust and easy to implement and attain nearly the same efficiency as the oracle maximum likelihood estimator for a variety of error distributions including the normal, mixed-normal, Student’s t, Cauchy distributions and etc.. Simulations are conducted to compare the performance of different estimators, and the proposed approach is used to analyze the S&P 500 Composite index, which endorse our theoretical results.

Session 109: Functional and Nonlinear Curve Analysis Predict the Effectiveness of Physical Therapy in Treating Lumber Disc Hernation Based on a LOGISTIC Curve Model � Xueying Li1 , Lin Wang2 , Zhen Huang2 , Xiaoping Kang2 and Chen Yao3 1 Peking University First Hospital 2 Department of Rehabilitation, Peking University First Hospital 3 Peking University First Hospital [email protected] Back ground: Physical therapy is concerned as an effective way of treating chronic pain. Nevertheless the processes of pain release is quite different among patients. Therefore the prediction of the efficacy of the prescription is admired. In previous study, we have successfully predicted the effectiveness of physical therapy of the whole period of treatment, by using the LOGISTIC curve model with the first 4 sections. In this study, we will try to use the LOGISTIC curve model in the prediction of the effectiveness of physical therapy with lumber disc hernation. Objective: Using the LOGISTIC curve model to describe the pain release processes of lumber disc hernation patient during physical therapy and evaluate the predicting ability of the model. Methods: 31 lumber disc hernation patients were involved in this study. Multiple physical therapies was administered to each patients for 10 sessions. A visual analogue scale was used to measure the pain intensity before each secssion. The LOGISTIC curve model was used in the regression analysis. The first 2 to 9 pain measurements were involved in different predictive models. And compare the predictive results from different models. Results: During the treatment processes, the pain scale reduced form 6.90 (3.04,10.00) to 4.40 (0.01,7.90). In regression analysis, the global R2 was 0.719. In predicting analysis, the more measurements involved in the predictive model, the more effective the prediction will be. When the model only include less than 4 measurements, the correct prediction is about 70%. While more than 5 measurements are used in the model, more than 80% of patients got the right prediction of the whole period of treatment. Conclusion: ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts The LOGISTIC model can describe and predict the effectiveness of multiple physical therapy on lumber disc hernation patients, by 4 sessions of treatment. Modeling and Forecasting Functional Time Series � Cong Feng, Lily Wang and Lynne Seymour Department of Statistics, University of Georgia [email protected] A novel method is proposed for forecasting a time series of smooth curves, using functional principal component analysis in combination with time series modeling and scores forecasting. We achieve the smoothing, dimension reduction and prediction at the same time with the expedient computation. The work is motivated by the demand to forecast the time series of economic functions, such as Treasury bond yield curves. Extensive simulation studies have been carried out to compare the prediction accuracy of our method with other competitor’s methods. The proposed methodology is applied to forecast the yield curves of government treasury bond. Joint Modeling of Longitudinal and Survival Data Using Markov Threshold Regression � Michael Pennell1 , Xin He2 and Mei-Ling Ting Lee2 1 The Ohio State University 2 University of Maryland [email protected] In biomedical studies examining time to a health-related event, precision of a treatment effect is often improved through modeling the survival data jointly with a longitudinal biomarker. A common approach is to use a proportional hazards model with the hazard dependent upon either random effects shared across the survival and longitudinal model or the expected value of the biomarker at each time point. In this talk, we consider an alternative method which jointly models the longitudinal marker process and a latent health process which fails once it hits a boundary value. In addition to being a conceptually appealing model, the approach does not require the assumption of proportional hazards and differentially estimates the effects of covariates on baseline health and the rate at which health degrades. Our method involves decomposing the health process into a series of Markov transitions resulting in a tractable likelihood function. A two stage approach is used for estimation in which we first model the biomarker process and generate empirical Bayes estimates of the true biomarker levels over time. Then, in the second stage, we maximize the likelihood for the health process conditional on the biomarker levels. The method is applied to longitudinal weight and survival data from a National Toxicology Program study. Semiparametric Bayes Local Additive Models for Longitudinal Data � Zhaowei Hua1 , Hongtu Zhu1 and David B. Dunson2 1 University of North Carolina at Chapel Hill 2 Duke University [email protected] In longitudinal data analysis, there is a great interest in assessing the impact of predictors on the time-varying trajectory in a response variable. In such settings, an important issue is to account for heterogeneity in the shape of the trajectory among subjects, while allowing the impact of the predictors to vary across subjects. We propose a flexible semiparametric Bayes approach for addressing this issue relying on a local partition process prior, which allows flexibly local borrowing of information across subjects. Local hypothesis testing and confidence bands are developed for the identification of time windows across which a predictor has a significant impact, and ICSA Applied Statistics Symposium 2011, NYC, June 26-29

adjusting for multiple comparisons. Posterior computation proceeds via an efficient MCMC algorithm using the exact block Gibbs sampler. The methods are assessed using simulation studies and applied to a yeast cell-cycle gene expression data set. Issues and Adjustment for Bioassay Nonlinear Dose Response Parameter Estimate � Rong Liu1 , Jane Liao1 and Jason Liao2 1 Merck Research Laboratories 2 Teva Pharmaceuticals rong [email protected] Nonlinear curve such as four parameter logistic function is commonly used to describe dose response relationship for bioassay analysis. In bioassay dose response, response variability is often a function of the response and correct weighting is crucial for ED50 and RP confidence interval estimation for nonlinear model. Standard statistics package such as SAS proc nlmixed gives the flexibility to estimate variance function and is preferable when multiple runs are available. Bootstrap confidence interval should be used when weighting function is the inverse of empirical variance at each dilution level. Zeng & Davidian (1996) and Carroll & Ruppert (1991) shows that bootstrap adjusted intervals achieve a higher degree of accuracy than usual Wald intervals in calibration study. In this presentation, possible adjustments on bootstrap confidence interval to achieve desired nominal level will be presented and the application of adjusted bootstrap confidence interval will be illustrated using a real example.

Session 110: Statistical Methods for High-Dimensional Data or Large Scale Studies The Effect of Heterogeneity on Statistical Evaluation of Interventions: An Empirical Study � Depeng Jiang1 , Debra J. Pepler2 and Leena K. Augimeri3 1 University of Manitoba 2 York University 3 Child Development Institute [email protected] In intervention studies, there are many forms of unobserved heterogeneity among participants. Such variability (heterogeneity) is often overlooked in the analysis of intervention data, because they are presented for the “average“ participant. In this paper, we illustrate the influence of population heterogeneity on intervention evaluation R Unusing data from a retrospective clinical study of the SNAP� der 12 Outreach Project for boys exhibiting serious antisocial beR ORP haviour problems. The 12-week, multi-component SNAP� is provided in a community-based outpatient setting and run by the Child Development Institute in Toronto, Canada. In addition to two core components offered to all children and their families (12-week R Boys Group and concurrent SNAP� R Parent Group), chilSNAP� dren and their families were able to access additional components such as individual befriending. Those who received just the core components were labeled as the standard treatment group, while the enhanced treatment group included those who also received one or more individual befriending sessions. The data were first analyzed under the assumption of population homogeneity using a random effect mixed model approach. We only found small effect size for the R ORP in terms of reduction in delinquency. The enhanced SNAP� data were then analyzed with the assumption of population heterogeneity using the growth mixture modeling approach. The growth

117

Abstracts R ORP made a submixture model shows that the enhanced SNAP� stantial difference in the treatment effects for the high-risk class, and strengthened the treatment effect significantly for the medium class. Further the Cox regression analyses stratified by the risk class indicated that the hazard of early onset of criminal offence for the enhanced group is 0.68 times (95% CI: 0.48-0.99) less likely than that of the standard groups.

Multilayer Correlation Structure of Microarray Gene Expression Data � Linlin Chen1 , Lev Klebanov2 and Anthony Almudevar3 1 Rochester Institute of Technology 2 Charles University 3 University of Rochester [email protected] In this talk, we focus on possible causes of between-gene dependencies and their effects on the performance of gene selection procedures. We show that there are at least two “noise-type”ˇtreasons for high correlations between gene expression levels. First is of technical character, and is connected to a random character of the number of cells used to prepare microarray. Another reason is the heterogeneity of cells in a tissue. Both reasons allow one to make some predictions, which are verified on real data. Bayesian Model Averaging as a Natural Choice for Differential Gene Expression Studies � Xi Kathy Zhou1 , Fei Liu2 and Andrew J. Dannenberg1 1 Weill Cornell Medical College 2 IBM Thomas J. Watson Research Center [email protected] Differentially expressed (DE) gene detection represents one of the key objectives of many microarray studies. As more studies are carried out with observational rather than well controlled experimental samples, it becomes important to evaluate and properly control the impact of sample heterogeneity on DE gene finding. Most methods for DE gene detection to date can be considered as single model approaches since they rely on the ranking of all the genes according to the value of a statistic derived from a single model for two or more group comparisons, with or without adjustment for other covariates. Such single model approaches are conceptually flawed since they unavoidably result in model misspecification. We show the evidence that differential gene expression study based on high throughput data intrinsically requires a multi-model handling. To properly control for sample heterogeneity and to provide a flexible and coherent framework for identifying simultaneously DE genes associated with a single or multiple sample characteristics and/or their interactions, we propose a Bayesian model averaging approach with an empirical prior model probability specification. We demonstrate through simulated microarray data that this approach improves the performance of DE gene detection comparing to the single model approaches. The flexibility of this approach is illustrated through analysis of data from two observational microarray studies. Importance of Statistical Pre-Processing of TMA Biomarker Data for Possible Spatial Bias � Daohai Yu1 , M. J. Schell1 , Z. Zheng1 , B. Rawal1 and G. Bepler2 1 H. Lee Moffitt Cancer Center & Research Institute 2 Karmanos Cancer Institute, Wayne State University [email protected] Background: Statistical pre-processing of expression data from gene microarrays is routinely performed. Similar procedures for biomarker data from tissue microarrays (TMAs) may be necessary for proper analysis.

118

Methods: Data from 58 TMA runs of the biomarker expression from 187 early stage NSCLC patients using the AQUA system were studied to assess the optimal power transformation (raw, log, squareroot, or quarter-root), and possible presence of spatial bias (linear, piecewise linear, or multiple changepoint) across either the 20 rows or 17 columns of the TMA. Optimal transformations minimize the undue influence of individual data points and attempt to homogenize the variability of data regardless of the AQUA score. When multiple transformations were suitable, they were favored in decreasing order as given above. Results: The optimal transformations (N/%) for the runs were: log (27/47%), square-root (22/38%), quarter-root (5/9%), and raw (4/7%). Row and column spatial bias was seen in 25 (43%) and 27 (47%) of the runs, with 40 (69%) of runs having at least one such bias. When present, the row and column biases accounted for 561% (Median=17%) and 6-55% (Median=14%) of the variation in the data. In most row bias cases (20/25), the effects appeared to be linearly decreasing across the rows. Column bias seemed to be primarily a multiple changepoint step function, with higher scores in the middle columns (C), particularly C 8-15. Conclusion: In most cases (93%), the biomarker data needed to be transformed before statistical analysis. Moreover, 2/3 of runs had some spatial bias, likely due to imperfect applications of reagent to a TMA slide. The column bias is partly due to the placement of control tissue (not included in the scores) in rows 11-20 by C 8-15. Thus, statistical pre-processing may be imperative. Routine examination of TMA data for possible spatial bias should be conducted and adjusted for when present. When appropriate, the laboratory technique should be altered to minimize the occurrence of such spatial bias. A Likelihood-Based Framework for Association Analysis of Allele-Specic Copy Numbers � Yijuan Hu, Wei Sun and Danyu Lin Department of Biostatistics, University of North Carolina at Chapel Hill [email protected] Copy number variants (CNVs) and single nucleotide polymorphisms (SNPs) co-exist throughout the human genome and jointly contribute to phenotypic variations. Thus, it is desirable to consider both types of variations, as characterized by allele-specific copy numbers (ASCNs), in association studies of complex human diseases. Current SNP genotyping technologies can simultaneously capture the CNV and SNP information. The common practice of first calling ASCNs from the SNP array data and then using the ASCN calls in downstream association analysis has important limitations. First, the association analysis may not be robust to the differential errors between cases and controls caused by the differences in DNA quality or handling. Second, the phenotypic information is not used in the calling process and the uncertainties in the ASCN calls are ignored. We present a general framework for the integrated analysis of CNVs and SNPs in association studies, including analysis of total copy numbers as a special case. Our approach combines the ASCN calling and association analysis into a single step while allowing for differential errors. We construct likelihood functions that properly account for case-control sampling and measurement errors. We establish the asymptotic properties of the maximum likelihood estimators and develop EM algorithms to implement the proposed inference procedures. The advantages of the proposed methods over the existing ones are demonstrated through realistic simulation studies and an application to a genome-wide association study of schizophrenia. Extensions to next-generation sequencing ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Abstracts data are discussed.

Session 111: Special presentation: CDISC Standard Development, Implementation Strategy, and Case Study Panel presentation � Frank Newby, � Steve Kopko, � Sunny Xie and � Jian Chen Department of Epidemiology and Population Health, Albert Einstein College of Medicine of Yeshiva University [email protected] CDISC is a global, open, multidisciplinary, non-profit organization that has established standards to support the acquisition, exchange, submission and archive of clinical research data and metadata. CDISC now has a full set of production standards to support the research process from Protocol through Analysis and Reporting. Standards that support Efficacy data related to specific disease/therapeutic are being developed to augment the CDISC Safety Data domains, while the existing standards are refined and maintained. Frank will present current CDISC activities with regulatory authorities, especially FDA, as well as the development roadmaps. (Frank) This presentation will discuss how to include the CDISC standardization consideration in a clinical study life cycle from protocol concept to Clinical Study Report. We will discuss the application of CDISC standards at different stages of compound development. Also we will discuss potential strategies medium and small pharmaceutical and biotech companies can plan to apply CDISC standards. (Steve & Sunny) Metadata driven solution is crucial to fully realize the promises from CDISC standards, and manage the complexity of a clinical study. An in-depth discussion of implementation details through a case study will illustrate how CDISC standards can greatly enhance the quality and efficiency of all stages of a clinical trial. (Jian) Bios of presenters: • Frank Newby, Chief Operating Officer, CDISC. Frank Newby is the COO for CDISC. He holds degrees in both Biology and Education from East Stroudsburg University and has more than 30 years experience in the pharmaceutical industry. Prior to joining CDISC he held the positions of Vice President, Information Technology and Data Management at SCIREX Corporation; Director, Information Technology and Data Management at Cell Pathways, Inc.; Director, Worldwide Clinical Systems Development for GlaxoSmithKline; and held a number of other positions in the areas of Clinical Information Management for GSK, Aventis, Merck and J&J. • Stephen Kopko, CDISC. Stephen Kopko is a results-oriented strategic leader who utilizes expertise in Clinical Systems, Statistical Programming, Programming & Data Standards, Data Warehousing and Electronic Regulatory Submissions to develop responsive and focused organizations. He is recognized as an inspiring leader, strategic thinker, trusted mentor and collaborator in internal and external organizations, who combines vision, interpersonal skills and business savvy in building an established track record in delivering quality solutions. Currently Stephen is an independent consultant of the CDISC organization. Previously, he had held various managerial roles such as the senior director of Biostatistics SysICSA Applied Statistics Symposium 2011, NYC, June 26-29

tem Development at Pfizer (previously Wyeth Research), senior director of Clinical Programming at Wyeth Research, senior director at R. W. Johnson Pharmaceutical Research Institute, and group director at SmithKline Beecham Pharmaceutical. In these roles, he had directed all systems development and technical support services, encompassing standard software development & maintenance, developed and maintained programming and data standards for use in global biostatistics & programming and global clinical team functions. He had lead implementation efforts on Data Warehousing, Data Migration Factory, Data Lifecycle Standard Programming, outsourcing CDISC SDTM Mapping Specifications and e-CTD Data Component CDISC SDTM Submission development/maintenance. Steve has an MS Computer Science degree from Villanova University. He had been the Chair of the CDISC Advisory Board (CAB) for a two year period (2008-2009), and provided strategic planning and consulting services on various technical computing topics. • Sunny Xie, Senior Director, Process Optimization in Clinical Development Operations and Biometrics-Shire Pharmaceuticals. Sunny Xie is the Senior Director for Process Optimization in Clinical Development Operations and Biometrics at Shire Specialty Pharma with a focus on identifying crossfunctional opportunities for process improvement within R&D. He is also managing IRT services at Shire. Prior to joining Shire, Sunny has worked 15+ years at several mid to large pharmaceutical companies in leadership roles, including Forest Laboratories, Eli Lilly & Company, and Pfizer (formally Pharmacia & Upjohn). Notable is his 5+ years as Executive Director, Head of Statistical Programming Department at Forest Laboratories, where he led a department of 38 employees and 10+ contractors supporting all projects, submissions and company acquisition and product in-licensing. During that time, Sunny has been promoted and recognized for being an effective leader and for his contribution to crossfunctional operational processes. Sunny has an MS Statistics degree from Sam Houston State University. • Jian Chen, President, EDETEK, Inc. Jian is an industry veteran with 20 years experience in statistics, information engineering, and business intelligence. Previously, Jian was the cofounder and CTO of Q-Square Business Intelligence, Inc., a CRO company also based in New Jersey. Before that, Jian had played many technical and managerial roles spanning across diverse industries including Financial Services, Insurance, and Consulting, such as VP at Merrill Lynch, Manager at BearingPoint, Enterprise Database Manager at Guardian Life Insurance, and Senior Statistician at Capital One. Jian has solid domain knowledge in the pharmaceutical, financial, and insurance industries, as well as deep and broad technology experience. He has leveraged the advancements in information technology to revolutionize existing business processes and data management. His patented metadata based solution provides a seamless integration of process and data (both structured and unstructured), allowing users to focus on business applications, as opposed to simply technology implementation. For the last six years, his clinical data management solutions have created long-term sustainable value for his clients. Jian has an MBA Finance and

119

Abstracts Strategy degree from Columbia University and an MS Biostatistics and Biomedical Engineering degree from Virginia Commonwealth University, School of Medicine. Jian is an

120

active member in various CDISC teams, such as CDASH, SDS, ADaM, and Share.

ICSA Applied Statistics Symposium 2011, NYC, June 26-29

Index of Authors Abdala, N, 30, 84 Accomando, WP, 35, 104 Acton, K, 36, 107 Addona, V, 27, 74 Agrawal, N, 25, 68 Airoldi, E, 38, 112 Al-Khalidi, H, 22, 59 Albert, P, 17, 35, 40, 101 Alemayehu, D, 22, 58 Alizadeh, A, 22, 56 Almudevar, A, 39, 118 Amlie-Lefond, CM, 37, 111 Anderson, K, 24, 65 Antao, VC, 28, 78 Anziano, R, 18, 43 Apanasovich, T, 35, 104 Asgharian, M, 27, 74 Atherton, J, 27, 74 Augimeri, LK, 39, 117 Baek, S, 26, 72 Bain, R, 34, 98 Ball, R, 25, 67 Banerjee, M, 21, 55 Banerjee, S, 24, 65 Bang, H, 33, 94 Bar, HY, 25, 97 Beals, J, 36, 107 Belin, T, 31, 89 Bencaz, AF, 28, 78 Bepler, G, 39, 118 Berg, P, 31, 87 Berlin, J, 22, 58 Blocker, A, 38, 112 Bondell, H, 18, 43 Bornkamp, B, 24, 37, 64, 110 Boscardin, J, 31, 89 Bozdogan, H, 22, 57 Brackbill, RM, 28, 78 Branford, S, 28, 78 Braun, WJ, 20, 50 Bravo, HC, 29, 81 Bresnick, EH, 29, 80 Bretz, F, 24, 30, 37, 64, 83, 110 Bridge, P, 18, 43 Bura, E, 34, 99 Burnett, D, 36, 108

Cabrera, J, 34, 100 Cai, T, 29, 32, 35, 36, 80, 92, 102, 107 Cao, G, 35, 105 Cao, J, 23, 62 Cao, X, 34, 99 Caporas, NE, 17, 40 Caporaso, N, 28, 76 Caragea, P, 27, 72 Carroll, R, 37, 109 Castagno, D, 28, 77 Castro, MD, 22, 59 Chan, IS, 27, 73 Chandra, A, 25, 67 Chang, C, 22, 59 Chang, MN, 30, 85 Chang, W, 21, 53, 54 Chatterjee, N, 38, 113 Chen, B, 21, 55 Chen, C, 22, 34, 59, 99 Chen, H, 38, 113 Chen, J, 21, 36, 38, 38, 39, 54, 108, 114, 115, 119 Chen, JJ, 19, 49 Chen, L, 18, 32, 39, 46, 93, 118 Chen, M, 22, 22, 31, 32, 38, 59, 59, 88, 89, 113 Chen, N, 33, 95 Chen, P, 22, 56 Chen, Q, 32, 36, 89, 106 Chen, R, 33, 34, 94, 100 Chen, S, 35, 103 Chen, X, 18, 22, 22, 42, 57, 57 Chen, Y, 21, 53 Chen, Z, 35, 37, 101, 110 Cheng, B, 37, 112 Cheng, C, 28, 79 Cheng, G, 33, 94 Cheng, J, 34, 100 Cheng, Y, 23, 61 Cheung, Y, 22, 56 Cheung, YK, 37, 112 Chi, GYH, 28, 77 Chi, Z, 31, 88 Cho, J, 38, 113

Christensen, BC, 35, 104 Chu, H, 18, 45 Chun, H, 32, 93 Chung, D, 29, 80 Claggett, B, 28, 36, 77, 107 Coad, S, 22, 56 Cone, JE, 28, 78 Conklin, DJ, 20, 50 Cook, R, 21, 23, 55, 62 Coopersmith, L, 36, 107 Cotton, C, 23, 62 Critchley, F, 25, 68 Cronin, K, 28, 76 Cross, J, 29, 82 Crowell, MD, 28, 78 Cuerden, M, 23, 62 Cui, L, 31, 86 Cui, TQ, 24, 63 Cui, Y, 25, 68 Dafni, U, 20, 51 Dai, J, 26, 71 Dai, Y, 29, 80 Daniels, M, 27, 29, 75, 82 Dannenberg, AJ, 39, 118 Daye, ZJ, 24, 63 Dean, C, 27, 75 DeMets, D, 38, 115 Deng, D, 32, 92 Deng, X, 18, 45 Dette, H, 24, 64 Deuster, PA, 36, 108 Dewey, C, 29, 80 Dey, DK, 37, 111 Dicker, L, 36, 105 Diggle, P, 24, 65 Ding, X, 19, 46 Ding, Y, 21, 55 Dinse, G, 19, 47 Dmitrienko, A, 19, 19, 30, 48, 48, 83 Dong, T, 32, 91 Dong, X, 21, 25, 25, 54, 66, 66 Dong, Y, 25, 68 Donoho, D, 17 Dorie, V, 17, 40 Dragalin, V, 17, 31, 41, 86 Du, P, 20, 29, 50, 79

121

Duan, Y, 31, 88 Dubin, JA, 28, 75 Dunson, DB, 39, 117 Dupuis, D, 17, 42 Eickhoff, J, 33, 94 Elashoff, R, 27, 27, 75, 75 Elliott, MR, 36, 106 Entsuah, R, 30, 83 Evans, S, 19, 48 Ezzalfani, M, 23, 59 Fan, C, 25, 25, 68, 68 Fan, J, 18, 19, 19, 45, 47, 47 Fan, L, 26, 69 Fan, SK, 22, 57 Fan, Y, 24, 63 Fang, H, 32, 32, 92, 92 Fang, Y, 36, 106 Feng, C, 28, 39, 77, 117 Feng, Y, 19, 19, 47, 47 Feuer, EJ(, 28, 76 Field, C, 28, 78 Fine, J, 26, 70 Finkelstein, MO, 23, 61 Finley, AO, 24, 65 Fiorentino, R, 18, 43 Flaherty, S, 33, 94 Fleming, T, 22, 59 Follmann, D, 29, 82 Forshee, R, 18, 42 Freidlin, B, 33, 95 Frimpong, E, 18, 43 Fu, G, 18, 44 Fu, JC, 39, 116 Fu, R, 37, 111 Fu, Y, 20, 52 Gail, MH, 17, 19, 24, 38, 40, 46, 66, 113 Galfalvy, H, 27, 74 Gao, C, 28, 79 Gao, S, 21, 53 Garcia, A, 39, 116 Gastwirth, J, 23, 23, 60, 61 Gavrilov, Y, 26, 71 Gaydos, B, 31, 86 Geller, NL, 17, 40 Gelman, A, 17, 17, 40, 41

Gentles, A, 22, 56 Ghysels, E, 18, 42 Gibbons, RD, 27, 75 Gilmore, JH, 20, 52 Goel, S, 17, 38, 41, 112 Goldberg, JD, 33, 96 Goldstein, DG, 38, 112 Gonen, M, 28, 78 Gordon, R, 18, 43 Gorrostieta, C, 37, 109 Graubard, B, 21, 54 Graubard, BI, 28, 76 Gray, R, 20, 51 Greene, RL, 34, 100 Griffith, S, 25, 98 Griffith, SD, 25, 97 Gu, W, 18, 45 Guhaniyogi, R, 24, 65 Guo, J, 18, 24, 46, 66 Guo, W, 32, 92 Guo, X, 21, 52 Hager, GL, 26, 72 Hall, CB, 20, 51 Hammami, I, 39, 116 Han, X, 18, 45 Handelman, S, 38, 115 Hang, Y, 17, 41 Hannig, J, 32, 91 Hao, N, 19, 47 Haran, M, 27, 72 Hardle, WK, 36, 105 He, W, 24, 34, 65, 99 He, X, 19, 39, 47, 117 He, Y, 31, 33, 89, 94 Heimer, R, 30, 84 Heitjan, DF, 25, 25, 30, 85, 86, 97, 98 Henderson, D, 37, 110 Hentz, JG, 28, 78 Hill, J, 17, 41 Ho, K, 23, 62 Ho, M, 33, 94 Hochner, H, 21, 54 Hofman, JM, 17, 41 Holsinger, K, 37, 111 Hong, Y, 22, 59 Hou, T, 38, 115 Hou, X, 25, 68 Houseman, EA, 35, 104 Hsiao, C, 21, 53 Hsu, C, 17, 42 Hsu, CH, 37, 110 Hsu, H, 30, 84 Hsu, L, 26, 70 Hu, I, 18, 23, 44, 62 Hu, J, 23, 62 Hu, N, 20, 50 Hu, T, 18, 44 Hu, Y, 39, 118 Hu, Z, 29, 82

Hua, Z, 39, 117 Huang, C, 35, 101 Huang, J, 23, 29, 62, 81 Huang, L, 18, 18, 36, 43, 43, 106 Huang, W, 21, 53 Huang, X, 30, 84 Huang, Z, 39, 116 Hughes, J, 27, 72 Hughes, S, 19, 49 Hughes, TP, 28, 78 Hung, HJ, 21, 31, 53, 86 Hwang, J, 18, 44 Hwang, W, 34, 101 Ibrahim, JG, 22, 32, 59, 89 Ireland, B, 29, 82 Irizarry, R, 29, 81 James, G, 24, 24, 63, 64 Jang, H, 20, 50 Janies, D, 38, 115 Jeng, XJ, 29, 80 Ji, P, 29, 81 Ji, Y, 37, 112 Jiang, D, 39, 117 Jiang, J, 28, 39, 76, 116 Jiang, L, 36, 107 Jiang, W, 21, 33, 54, 95 Jiang, X, 20, 52 Jimenez, R, 24, 24, 65, 66 Jin, B, 25, 31, 68, 88 Jin, J, 29, 32, 81, 93 Jin, Z, 34, 100 Johnson, B, 19, 49 Joo, J, 21, 54 Jung, S, 30, 85 Kai, B, 35, 103 Kaiser, MS, 26, 72 Kalbfleisch, JD, 27, 73 Kalimuthu, K, 32, 91 Kang, SJ, 30, 85 Kang, X, 39, 116 Kao, TC, 36, 108 Karlis, D, 20, 51 Karunamuni, R, 21, 56 Keating, N, 25, 67 Keles, S, 29, 80 Kelsey, KT, 35, 104 Khan, AH, 31, 87 Kim, H, 30, 83 Kim, JH, 33, 96 Kim, JK, 35, 103 Kim, K, 33, 96 Kim, M, 20, 22, 51, 56 Klebanov, L, 39, 118 Kodell, RL, 26, 72 Koestler, DC, 35, 104 Kolaczyk, E, 29, 81 Kolahi, J, 33, 94

Kong, F, 33, 97 Kong, M, 20, 50 Konig, C, 25, 69 Kooperberg, C, 26, 71 Kopko, S, 39, 119 Kordzakhia, G, 30, 83 Kosorok, M, 36, 107 Koury, K, 34, 98 Krams, M, 24, 64 Kringle, R, 26, 69 Kuan, PF, 18, 29, 45, 80 Kumar, A, 20, 50 Kurum, E, 36, 108 Kwak, M, 26, 71 Kwate, NO, 24, 65 Kwiek, J, 38, 115 Kwon, S, 21, 52 Lagakos, S, 20, 51 Lai, TL, 23, 61 Lai, Y, 32, 93 Laird, N, 34, 98 Lamont, E, 25, 67 Lan, G, 38, 115 Lan, KKG, 28, 76 Landi, MT, 17, 40 Landrum, MB, 25, 67 Landsman, V, 28, 76 Lawrence, J, 20, 51 Lawson, A, 24, 65 LeBlanc, M, 26, 71 Ledeley, MNA, 23, 59 Lee, CH, 32, 90 Lee, J, 22, 33, 56, 96 Lee, JJ, 31, 33, 86, 95 Lee, MT, 19, 39, 47, 117 Lee, S, 37, 112 Lee, SH, 33, 36, 96, 106 Lee, SY, 33, 96 Lee, T, 27, 75 Legg, J, 20, 49 Leng, C, 23, 63 Leu, C, 22, 22, 56, 57 Levenstein, M, 34, 98 Levin, B, 22, 22, 23, 56, 57, 61 Levina, L, 18, 35, 46, 102 Levitan, B, 29, 82 Lewis, JW, 30, 84 Li, B, 29, 32, 80, 92 Li, D, 30, 83 Li, G, 27, 28, 31, 75, 77, 87 Li, H, 18, 20, 24, 27, 29, 34, 45, 50, 66, 73, 80, 98 Li, J, 18, 19, 44, 46 Li, L, 24, 35, 36, 63, 102, 105 Li, M, 38, 38, 114, 115 Li, N, 27, 27, 75, 75 Li, P, 20, 25, 29, 52, 69, 81

Li, Q, 27, 37, 73, 110 Li, R, 35, 36, 102, 103, 108 Li, X, 27, 36, 37, 39, 73, 108, 110, 116 Li, Y, 17, 18, 18, 19, 21, 23, 25, 30, 41, 42, 44, 47, 54, 62, 85, 98 Liang, F, 21, 23, 54, 61 Liang, H, 18, 29, 37, 45, 80, 109 Liang, K, 29, 80 Liao, D, 28, 76 Liao, H, 38, 114 Liao, J, 30, 30, 39, 84, 84, 117 Liaw, A, 25, 67 Lillard, DR, 25, 97 Lim, J, 36, 106 Lim, P, 34, 99 Lima, JA, 21, 54 Lin, CD, 18, 45 Lin, D, 21, 26, 39, 54, 118 Lin, DT, 21, 53 Lin, FNA, 31, 88 Lin, S, 35, 104 Lin, X, 26, 70 Lin, Y, 22, 58 Little, RJ, 27, 36, 73, 106 Liu, A, 17, 40 Liu, B, 20, 49 Liu, C, 17, 22, 40, 56 Liu, D, 22, 26, 57, 70 Liu, F, 39, 118 Liu, FG, 27, 73 Liu, GF, 23, 60 Liu, H, 30, 84 Liu, J, 17, 38, 41, 115 Liu, JS, 35, 102 Liu, L, 19, 49 Liu, N, 34, 98 Liu, Q, 34, 99 Liu, R, 22, 39, 57, 117 Liu, S, 33, 95 Liu, X, 32, 34, 91, 100 Liu, XS, 29, 80 Liu, Y, 18, 24, 43, 64 Lix, L, 32, 90 Lo, S, 18, 44 Loh, JM, 24, 65 Lou, W, 32, 90 Lu, N, 23, 60 Lu, S, 22, 58 Lu, T, 18, 45 Lu, W, 24, 63 Lu, Y, 22, 57 Lue, H, 25, 69 Lunceford, J, 33, 95 Luo, X, 35, 101 Lv, J, 35, 102

Ma, C, 30, 85 Ma, H, 38, 114 Ma, P, 38, 115 Ma, S, 29, 29, 79, 80, 81 Maadooliat, M, 23, 62 MacNab, Y, 32, 90 Malbecq, W, 38, 114 Malecki, M, 17, 40 Mallik, A, 21, 55 Manson, S, 36, 107 Marcus, SM, 27, 75 Markatou, M, 25, 67 Marron, JS, 20, 24, 52, 64 Marsit, CJ, 35, 104 McCormick, T, 38, 113 McIntyre, J, 37, 109 McKeague, I, 33, 94 McKeague, IW, 28, 77 Meara, E, 25, 67 Mehrotra, D, 17, 25, 41, 69 Mehta, C, 34, 99 Meng, Z, 26, 38, 69, 114 Meyer, CA, 26, 71 Miao, W, 23, 61 Michailidis, G, 18, 21, 46, 55 Michor, F, 28, 78 Mick, R, 30, 85 Millen, B, 19, 48 Min-Seok, M, 33, 96 Mitchell, C, 36, 107 Mo, Q, 38, 114 Moore, RH, 30, 86 Moreira, C, 27, 74 Mueller, H, 36, 105 Nagarajan, R, 26, 72 Nan, B, 21, 55 Natanegara, F, 31, 87 Nayak, TK, 31, 87 Newby, F, 39, 119 Newton, AS, 32, 90 Ng, T, 30, 84 Ngai, J, 31, 87 Nguyen, H, 34, 100 Nguyen, HQ, 37, 111 Nguyen, T, 28, 76 Nguyen,, H, 19, 48 Niewiadomska-Bugaj, 31, 87 Ning, J, 27, 30, 74, 84 Niu, Y, 19, 47 Niwitpong, S, 31, 86 Nixon, R, 29, 82 Nordman, D, 24, 66 Nuel, G, 39, 116 Nygren, LM, 36, 107 O’Quigley, J, 19, 49 Oh, S, 33, 96 Ohlssen, D, 32, 89

M,

Oliveira, P, 29, 82 Olshen, AB, 38, 114 Ombao, H, 37, 109 Orden, AV, 33, 94 Ouyang, SP, 38, 114 Paik, J, 26, 70 Pallos, LL, 28, 78 Pan, W, 34, 98 Pang, Z, 19, 49 Panichkitkosolkul, W, 31, 86 Paredes, A, 18, 43 Park, J, 33, 38, 94, 113 Park, T, 33, 96 Parker, JD, 28, 76 Parmeter, C, 37, 110 Pena, E, 33, 96 Peng, Y, 30, 32, 82, 91 Pennell, M, 39, 117 Pepler, DJ, 39, 117 Pfeiffer, R, 19, 46 Phillips, MR, 27, 74 Pinheiro, J, 24, 28, 37, 64, 76, 110 Pourahmadi, M, 37, 109 Pregibon, D, 17, 41 Prentice, R, 26, 71 Qi, L, 33, 94 Qian, L, 35, 103 Qian, M, 33, 94 Qian, PZ, 18, 45 Qiao, X, 24, 64 Qin, G, 35, 103 Qin, J, 17, 20, 27, 29, 35, 40, 52, 74, 82, 103 Qiu, P, 21, 55 Qu, A, 21, 55 Quan, H, 26, 38, 69, 114 Radchenko, P, 18, 24, 43, 64 Rao, JS, 28, 76 Raunig, D, 33, 95 Rawal, B, 39, 118 Ray, H, 28, 79 Reiss, P, 36, 106 Ren, Q, 24, 65 Roberts, F, 34, 100 Rooney, J, 19, 48 Rosales, MAC, 31, 87 Rosychuk, RJ, 32, 90 Roubideaux, Y, 36, 107 Ruan, S, 23, 60 Rubin, D, 30, 83 Ryeznik, Y, 30, 85 Salsburg, D, 33, 97 Sanalkumar, R, 29, 80 Sankoh, AJ, 27, 73 Sapp II, JH, 28, 78

Sarkar, S, 32, 92 Scharfstein, D, 29, 82 Schell, MJ, 39, 118 Schenker, N, 28, 76 Schindler, J, 24, 31, 65, 86 Schlain, B, 30, 84 Schutt, R, 17, 41 Schwartzman, A, 26, 71 Selassie, AW, 34, 100 Sen, B, 21, 55 Senturk, D, 36, 108 Seymour, L, 39, 117 Shah, IA, 31, 87 Shao, Y, 20, 21, 52, 53 Shen, M, 24, 25, 66, 66 Shen, R, 38, 114 Shen, Y, 27, 74 Shieh, M, 37, 109 Shiffman, S, 25, 97 Shih, WJ, 22, 38, 58, 114 Shim, Y, 28, 78 Shults, J, 30, 83 Siegel, ER, 26, 72 Siegmund, K, 35, 103 Simon, R, 33, 95 Sinha, A, 31, 88 Sirer, MI, 17, 41 Snapinn, S, 22, 58 Song, PX, 19, 47 Song, Q, 23, 28, 61, 77 Song, R, 29, 81 Song, Y, 24, 65 Soon, G, 19, 48 Soukup, M, 19, 46 Srivastava, S, 32, 93 Staudenmayer, J, 37, 109 Stefanski, LA, 37, 109 Stellman, SD, 28, 78 Stephens, M, 36, 108 Stittelman, O, 28, 77 Stodden, V, 23, 61 Strawderman, RL, 19, 49 Strawderman, W, 19, 48 Styner, M, 20, 52 Su, H, 29, 80 Su, Y, 17, 41 Sun, G, 26, 69 Sun, J, 34, 101 Sun, L, 34, 99 Sun, N, 32, 93 Sun, T, 36, 108 Sun, W, 26, 32, 39, 71, 92, 118 Sun, X, 25, 69 Sung, M, 26, 72 Sverdlov, O, 30, 85 Szabo, A, 37, 111 Tamhane, A, 19, 48 Tang, J, 22, 57 Tang, L, 20, 20, 50, 50

Tang, M, 28, 78 Taslim, C, 35, 104 Thall, PF, 37, 111 Therneau, T, 22, 59 Thompson, D, 27, 75 Tian, L, 22, 28, 32, 36, 56, 77, 91, 107 Tibshirani, R, 22, 56 Tilley, BC, 34, 100 Ting, N, 30, 85 Todem, D, 35, 105 Tong, X, 19, 47 Triche Jr., T, 35, 103 Tsai, W, 22, 27, 59, 74 Tsai, YT, 36, 108 Tseng, C, 21, 27, 53, 75 Tsong, Y, 21, 24, 25, 54, 66 Tsou, H, 21, 53, 54 Tu, Y, 27, 74 Tulupyev, A, 30, 84 ` U˜na-Alvarez, JD, 27, 74 van der Laan, M, 28, 77 Verducci, J, 38, 115 Viles, W, 29, 81 Wang, C, 20, 29, 51, 82 Wang, F, 18, 25, 42, 69 Wang, G, 37, 111 Wang, H, 18, 20, 25, 25, 44, 52, 97, 98 Wang, J, 20, 31, 34, 36, 51, 87, 100, 106 Wang, L, 20, 26, 28, 29, 37, 39, 50, 69, 77, 79, 111, 116, 117 Wang, M, 26, 31, 70, 87 Wang, N, 17, 18, 37, 41, 44, 110 Wang, Q, 25, 68 Wang, R, 20, 51 Wang, S, 18, 19, 29, 31, 35, 38, 45, 47, 79, 86, 103, 104, 114 Wang, T, 34, 101 Wang, W, 17, 38, 38, 40, 114, 115 Wang, WW, 27, 73 Wang, X, 30, 33, 85, 94 Wang, Y, 33, 36, 38, 95, 105, 113 Wang, Z, 18, 38, 44, 113 Wang , Y, 36, 108 Wei, L, 26, 28, 36, 69, 77, 107 Wei, W, 30, 84 Wei, Y, 36, 37, 106, 109 Weihu, C, 17, 42 Wells, MT, 24, 63 Wendy, C, 25, 68

Wheeler, D, 31, 88 Whitmore, GA, 19, 47 Wiencke, JK, 35, 104 Wileyto, EP, 25, 98 Wilson, A, 24, 66 Wolfson, D, 27, 74 Wong, K, 27, 74 Wong, P, 38, 114 Wong, WK, 30, 85 Wu, C, 20, 25, 37, 38, 50, 68, 110, 113 Wu, CO, 21, 26, 54, 71 Wu, D, 28, 78 Wu, H, 18, 20, 29, 45, 50, 81 Wu, J, 21, 34, 56, 98 Wu, JJ, 28, 79 Wu, MC, 36, 107 Wu, Q, 28, 78 Wu, R, 18, 23, 44, 62 Wu, W, 33, 96 Wu, WB, 37, 109 Wu, X, 26, 34, 72, 100 Wu, Y, 18, 22, 24, 43, 58, 64 Wulfsohn, M, 19, 48 Xie, M, 22, 34, 57, 100 Xie, S, 39, 119 Xing, E, 38, 112 Xing, H, 23, 61 Xiong, C, 32, 91 Xiong, X, 28, 75 Xu, H, 26, 69 Xu, J, 23, 60 Xu, L, 34, 99

Xu, Q, 36, 108 Xu, Y, 23, 60 Xue, H, 20, 50 Xue, L, 19, 21, 35, 49, 55, 102 Yan, X, 23, 60 Yang, B, 35, 103 Yang, C, 34, 100 Yang, L, 27, 28, 35, 36, 73, 77, 105 Yang, M, 31, 88 Yang, MC, 32, 91 Yang, Q, 21, 53 Yang, S, 33, 95 Yang, X, 33, 94 Yang, Y, 35, 102 Yao, C, 39, 116 Yao, F, 36, 105 Yi, G, 21, 55 Yi, GY, 37, 110 Yin, X, 25, 68 Ying, G, 30, 86 Ying, Z, 19, 34, 47, 100 Yoon, F, 25, 67 Yu, C, 19, 20, 48, 49 Yu, D, 39, 118 Yu, K, 17, 21, 40, 54 Yu, M, 28, 76 Yu, T, 23, 28, 62, 77 Yu, W, 19, 47 Yuan, Y, 20, 52 Yukich, J, 24, 66

Zagari, MJ, 22, 58 Zaidat, OO, 37, 111 Zaslavsky, AM, 31, 89 Zelterman, D, 30, 84 Zeng, D, 36, 107 Zeng, L, 27, 75 Zhang, B, 21, 35, 56, 101 Zhang, C, 23, 25, 26, 29, 36, 62, 68, 72, 81, 108 Zhang, D, 24, 25, 25, 63, 68, 68 Zhang, G, 19, 28, 49, 76 Zhang, H, 17, 19, 27, 40, 47, 73 Zhang, HH, 24, 64 Zhang, J, 19, 24, 32, 37, 38, 38, 46, 65, 91, 109, 114, 115 Zhang, K, 34, 98 Zhang, M, 24, 63 Zhang, W, 19, 23, 36, 46, 63, 106 Zhang, X, 31, 32, 34, 89, 93, 101 Zhang, Y, 28, 34, 37, 77, 99, 111 Zhang, Z, 17, 24, 40, 63 Zhao, H, 32, 38, 93, 113 Zhao, L, 28, 36, 77, 107 Zhao, O, 32, 91 Zhao, P, 38, 114 Zhao, Y, 35–37, 102, 107,

111 Zhao, Z, 32, 37, 93, 110 Zhen, B, 30, 84 Zheng, G, 21, 26, 54, 71 Zheng, S, 36, 105 Zheng, T, 18, 34, 38, 44, 100, 113 Zheng, X, 18, 42 Zheng, Y, 26, 31, 70, 89 Zheng, Z, 39, 118 Zhi, D, 34, 98 Zhong, J, 25, 25, 66, 66 Zhong, L, 19, 48 Zhong, W, 35, 102 Zhou, H, 31, 87 Zhou, J, 21, 55 Zhou, M, 29, 80 Zhou, X, 18, 20, 45, 50 Zhou, XK, 39, 118 Zhou, Y, 27, 73 Zhu, H, 20, 26, 36, 39, 52, 70, 105, 117 Zhu, J, 18, 19, 31, 31, 35, 44, 46, 47, 88, 89, 102 Zhu, L, 25, 35, 35, 37, 68, 102, 102, 109 Zhu, Z, 27, 73 Zohar, S, 23, 59 Zou, F, 26, 71 Zou, H, 35, 102, 103 Zou, J, 36, 105

Statistics in Biosciences Journal of the International Chinese Statistical Association

Editors-in-Chief K.K. Gordon Lan, Johnson & Johnson, Raritan, NJ Xihong Lin, Harvard School of Public Health, Boston, MA Statistics in Biosciences (SIB) is published twice a year in print and electronic form. It aims at development and application of statistical methods and their interface with other quantitative methods, such as computational and mathematical methods, in biological and life science, health science, and biopharmaceutical and biotechnological science.

Now submit and track your manuscript to Statistics in Biosciences with ease, using Editorial Manager²6SULQJHU¶VIXOO\secure web-based manuscript handling system for online-submission, peerreview and tracking. http://www.editorialmanager.com/sibs/  

SIB publishes scientific papers and review articles in four sections, with the first two sections as the primary sections. Original Articles publish novel statistical and quantitative methods in biosciences. The Bioscience Case Studies and Practice Articles publish papers that advance statistical practice in biosciences, such as case studies, innovative applications of existing methods that further understanding of subject-matter science, evaluation of existing methods and data sources. Review Articles publish papers that review an area of statistical and quantitative methodology, software, and data sources in biosciences. Commentaries provide perspectives of research topics or policy issues that are of current quantitative interest in biosciences, reactions to an article published in the journal, and scholarly essays. Substantive science is essential in motivating and demonstrating the methodological development and use for an article to be acceptable. Articles published in SIB share the goal of promoting evidence-based real world practice and policy making through effective and timely interaction and communication of statisticians and quantitative researchers with subject-matter scientists in biosciences.

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.