CS7700
Data Intensive
Distributed Computing
Fall 2006 -
Paper Reading List
Background:
T. Hey, A. Trefethen.
"The Data Deluge: An e-Science
Perspective ", in Grid
Computing - Making the Global Infrastructure a Reality , chapter 36 , pp. 809-824 . Wiley and Sons .
W. E.
Johnston, "High-Speed, Wide
Area, Data Intensive Computing: A Ten Year Retrospective ",
7th IEEE Symposium on High Performance Distributed Computing, July 29-31, 1998, Chicago , IL .
I. Foster, and C.
Kesselman, "Computational
Grids ", in The Grid: Blueprint for a New
Computing Infrastructure , Morgan-Kaufman, 1999.
A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S.
Tuecke, " The Data
Grid: Towards an Architecture for the Distributed Management and Analysis
of Large Scientific Datasets ",
Journal of Network and Computer Applications , 23:187-200, 2001.
Applications:
J. Lee, B. Tierney, W. E.
Johnston, "Data Intensive Distributed Computing; A Medical
Application Example ", HPCN
Europe 1999: pp.150-158.
K. Holtman, "CMS
Data Grid System Overview and Requirements ", CMS Note 2001/037,CERN , July 2001.
B. Spencer Jr., T.A. Finholt, I. Foster,
C. Kesselman, et al., "NEESgrid: A Distributed Collaboratory for Advanced
Earthquake Engineering Experiment and Simulation " ,
13th World Conference on Earthquake Engineering , August 2004.
S. Barnard, R. Biswas, S. Saini, R. Van der
Wijngaart, M. Yarrow, L. Zechter, I. Foster, O. Larsson. " Large-Scale Distributed
Computational Fluid Dynamics on the Information Power Grid using Globus ", Proceedings of Frontiers’99, 1999
Grid Toolkits:
I. Foster, C. Kesselman, " Globus:
A Metacomputing Infrastructure Toolkit ", International Journal of Supercomputer
Applications , 11(2):115-128, 1997.
D. Thain, T. Tannenbaum,
and M. Livny, "Condor and the Grid ", in Grid Computing: Making the Global Infrastructure a Reality ,
John Wiley, 2003.
G. Allen, K. Davis, T.
Goodale, A. Hutanu, et al. "The Grid Application Toolkit: Toward
Generic and Easy Application Programming Interfaces for the Grid ",
Proceedings of the IEEE , Volume
93, Issue 3, March 2005 Page(s): 534 – 550, 2005.
K. Seymour, A. Yarkhan, S. Agrawal , J. Dongarra, "NetSolve: Grid Enabling Scientific Computing
Environments ", Grid Computing and New Frontiers of High
Performance Processing , Elsevier Press, Advances in Parallel
Computing, 14, 2005.
Distributed Storage:
B. Tierney, J. Lee, B.
Crowley, M. Holding, J. Hylton, F. L. Drake, "A Network-Aware
Distributed Storage Cache for Data Intensive Environments ", in Proceedings of the
Eighth IEEE International Symposium on High Performance Distributed
Computing , pages
185-193, Redondo Beach, CA, August 1999.
D. Teaff, R. W. Watson, and R. A. Coyne,
"The Architecture of the High Performance Storage
System (HPSS) "
Proceedings of the Goddard
Conference on Mass Storage and Technologies , College Park , MD ,
March, 1995.
A. Rajasekar, M. Wan, and
R. Moore, "MySRB & SRB -
Components of a Data Grid ", the 11th International
Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh , Scotland ,
July 24-26, 2002.
A Shoshani, A Sim, J Gu,
"Storage Resource Managers: Middleware
Components for Grid Storage ", Proceedings of the Nineteenth IEEE Symposium on Mass Storage ,
2002.
Grid File Systems:
F. Schmuck, and R. Haskin,
"GPFS:
A Shared-Disk File System for Large Computing Clusters ", in Proceedings of the 1st USENIX
Conference on File and Storage Technologies, Monterey, CA, January 28 -
30, 2002.
P. H. Carns, W. B. Ligon,
R. B. Ross, and R. Thakur, "PVFS:
A Parallel File System for Linux Clusters ",
Proceedings of the 4th Annual Linux Showcase and Conference, 2000.
J. Kubiatowicz, D. Bindel,
Y. Chen, S. Czerwinski , et al., "OceanStore:
An Architecture for Global-Scale Persistent Storage ", in Proceedings of the Ninth
international Conference on Architectural Support for Programming
Languages and Operating Systems (ASPLOS) , November 2000.
O. Tatebe, Y. Morita, S.
Matsuoka, N. Soda, and S, Sekiguchi, "Grid
Datafarm Architecture for Petascale Data Intensive Computing "
Proceedings of the 2nd IEEE/ACM
International Symposium on Cluster Computing and the Grid (CCGrid ),
pp.102-110, 2002.
Remote
I/O:
I. Foster,
D. Kohr, R. Krishnaiyer, J. Mogill, " Remote I/O: Fast Access to Distant
Storage ", Proc.
Workshop on I/O in Parallel and Distributed Systems (IOPADS) , pp.
14-25, 1997.
J. Lee, X. Ma, R. Ross, R.
Thakur, and M. Winslett, "RFS: Efficient and Flexible Remote File Access for MPI-IO ", Proceedings of the International
Conference on Cluster computing , 2004
High Performance Data Transfers:
B. Allcock, , J. Bester,
J. Bresnahan, et. al., "Data Management and Transfer in High
Performance Computational Grid Environments ". Parallel
Computing Journal , Vol. 28 (5), May 2002, pp. 749-771.
S. Vazhkudai, J. M.
Schopf, and I. Foster, "Predicting
the Performance of Wide Area Data Transfers ",
Proceedings of the 16th International Parallel and Distributed
Processing Symposium (IPDPS 2002) , April 2002.
E. He, J. Leigh, O. Yu,
and T. A. DeFanti, "Reliable Blast UDP : Predictable High Performance Bulk Data
Transfer ", IEEE Cluster
Computing Conference , Chicago ,
IL , 2002.
T Kelly, "Scalable
TCP: Improving Performance in
Highspeed Wide Area Networks ", ACM SIGCOMM
Computer Communication Review , 2003
Data
Staging and Replication:
W. R. Elwasif, J. S.
Plank, and R. Wolski, "Data Staging Effects in Wide Area Task Farming
Applications ", IEEE International Symposium on Cluster
Computing and the Grid , Brisbane, Australia, May, 2001.
D Aksoy, M. J. Franklin,
S. Zdonik, "Data
Staging for On-Demand Broadcast ",
Proceedings of Very Large Databases
(VLDB), 2001.
H. Stockinger, A. Samar,
B. Allcock, I. Foster, K. Holtman, and B.
Tierney, "File and Object Replication in Data Grids ", Proceedings of the Tenth
International Symposium on High Performance Distributed Computing
(HPDC-10) , IEEE Press, August 2001.
A. Chervenak, B.
Schwartzkopf, H. Stockinger, et al, "Giggle: A Framework for
Constructing Scalable Replica Location Services ", Proceedings of the 2002 ACM/IEEE
conference on Supercomputing , Baltimore ,
Maryland , 2002.
Traditional Scheduling:
V. Hamscher, U.
Schwiegelshohn, A. Streit, and R. Yahyapour, "Evaluation of Job Scheduling
Strategies for Grid Computing ", Grid Workshop at 7th International Conference on High Performance
Computing (HiPC-2000) , Bangalore, India, LNCS 1971, pp. 191 – 202.
K. Ranganathan, and I. Foster, "Computation Scheduling and Data
Replication Algorithms for Data Grids ", Grid Resource Management: State of the Art and Future Trends,
Kluwer Academic Publishers, 2003.
F. D. Berman, R. Wolski, S. Figueira , J. Schopf, and G. Shao, "Application- level scheduling on distributed heterogeneous networks ",
Proceedings of the 1996 ACM/IEEE conference
on Supercomputing , 1996.
A.
Alhusaini, V. K. Prasanna, and C.S. Raghavendra, "A Unified Resource Scheduling
Framework for Heterogeneous Computing Environments ", in Proceedings of the Heterogeneous Computing
Workshop , pages 156-165, San Juan, PR, April 1999.
Data Management and Co-scheduling:
T. Kosar, and Miron Livny,
"A Framework for Reliable and
Efficient Data Placement in Distributed Computing Systems ", Journal of
Parallel and Distributed Computing , Volume 65, Issue 10, October 2005.
D. Thain, J. Basney , S.C. Son,
and M. Livny, "The Kangaroo Approach to Data Movement on the
Grid ", Tenth IEEE Symposium on High Performance Distributed
Computing (HPDC10) , San
Francisco , California ,
August 7-9, 2001.
A. Romosan, D. Rotem, A.
Shoshani, and D. Wright, "Co-Scheduling of Computation and Data on
Computer Clusters ", in
Proceedings of SSDBM 2005 , pp.103-112.
J Basney, and M. Livny,
"Improving Goodput by Co-scheduling CPU and Network
Capacity ", in
Proceedings of International Conference on High Performance Distributed
Computing (HPDC) , 1999.
39. W.
Allcock, J. Bester, J. Bresnahan, I. Foster, J. Gawor, J. A. Insley, J. M.
Link, and M. E. Papka, "GridMapper: A Tool for Visualizing the
Behavior of Large-Scale Distributed Systems ". 11th IEEE International Symposium on
High Performance Distributed Computing (HPDC-11), pp179-187, Edinburgh , Scotland ,
July 24-16, 2002.
40. N.
Karonis, M. Papka, J. Binns, J. Bresnahan, J. Insley, D. Jones, and J. Link, " High-Resolution
Remote Rendering of Large Datasets in a Collaborative Environment ", Future Generation of
Computer Systems (FGCS) , 2003.
41.
C. Zhang, J. Leigh, T. A.
DeFanti, M. Mazzucco, and R. Grossman, "TeraScope: Distributed Visual Data Mining
of Terascale Data Sets over Photonic Networks " Future
Generation Computer Systems (FGCS) , 2003.
42. J. Leigh, T. DeFanti, R. Singh, F. Karayannis, "TeraVision: a
High Resolution Graphics Streaming Device for Amplified Collaboration
Environments ",
Future Generation Computer Systems (FGCS), 2003.
Workflow Management:
P. Couvares, T. Kosar, A . Roy, Jeff Weber,
and Kent Wegner, "Workflow
Management in Condor ", to appear in Workflows for
e-Science, Springer Press,
2006.
B. Ludascher, I. Altintas, C. Berkley, D. Higgins, et al., "Scientific
Workflow Management and the Kepler System ", Concurrency and Computation: Practice
& Experience, Special Issue on
Scientific Workflows , 2005.
I. Foster, J. Voeckler, M.
Wilde, and Y. Zhao, "Chimera: A Virtual Data System for
Representing, Querying and Automating Data Derivation ",
Proceedings of the 14th Conference on Scientific and Statistical
Database Management, Edinburgh ,
Scotland ,
July 2002.
E. Deelman, J. Blythe, Y. Gil, C. Kesselman, et
al., "Pegasus : Mapping Scientific Workflows onto
the Grid ", Across Grids
Conference , Nicosia ,
2004.
Future Challenges:
DOE-Office
of Science, " The Data Management Challenge ",
Report from the DOE Office of
Science Data-Management Workshops , March-May 2004.
NSF, "Research
Challenges in Distributed Computer Systems ", NSF Report , 2005.
Reference Papers*: (*These papers will
not be discussed in the class, but they are good reference and background
papers to read!)
T.
Kosar, "Data Placement in Widely Distributed Systems ",
Ph.D. Thesis, University of
Wisconsin-Madison , August 2005.
J.H.
Saltzer, D.P. Reed, and D.D. Clark, "End-To-End Arguments in System
Design ", ACM Trans. on Computer Systems 2 , 4, November 1984, pp. 277-288.
J.M. Schopf and B.
Nitzberg, "Grids: Top Ten Questions ",
Scientific Programming, special issue on Grid Computing , 10(2):103
- 111, August 2002.
S.
Venugopal , R. Buyya, and K. Ramamohanarao, "A Taxonomy of Data Grids for Distributed Data Sharing,
Management, and Processing ", ACM Computing Surveys (CSUR) , 2006
J. Yu, and R. Buyya,
"A Taxonomy of
Scientific Workflow S ystems
for Grid C omputing ",
SIGMOD Record , 2005.