Administrative information
Administrative course information is available here
We use the inf-2202-f15@list.uit.no mailing list to send important information.
We have the following rooms and hours:
- Tuesdays 14:15-16:00, A016.
- Thursdays 14:15-16:00, 2.019AUD (Teknobygget).
- Fridays 10:15-11:00, A016. (We will not usually use this hour)
Staff
- Lars Ailo Bongo (larsab@cs.uit.no), Office: A259
- Ibrahim Umar (ibrahim.umar@uit.no), Office: A238
Lecture plan
Lecture | Date | Subject | Lecturer |
---|---|---|---|
L1 | Fri 14.08 | Introduction | Lars Ailo |
L2 | Thu 20.08 | Threads and synchronization primitives | Lars Ailo |
L3 | Thu 27.08 | Guest lecture: Go | Giacomo Tartari |
L4 | Thu 03.09 | Parallel architectures | Lars Ailo |
L5 | Thu 10.09 | Parallel programs | Lars Ailo |
L6 | Tue 15.09 | Programming for performance | Lars Ailo |
L7 | Thu 24.09 | Performance evaluation | Lars Ailo |
L8 | Thu 01.10 | Parallel program performance evaluation | Lars Ailo |
Thu 08.10 | Postponed due to a major water leak! | - | |
L10 | Thu 15.10 | Guest lecture: Scala and Spark | Inge Alexander Raknes |
L9 | Thu 22.10 | Data-intensive computing | Lars Ailo |
L11 | Thu 29.10 | Spark libraries | Lars Ailo |
Thu 05.11 | Cancelled | Lars Ailo | |
L12 | Tue 10.11 | Guest lecture: Stallo (no slides) | Steinar Trædal-Henden |
L13 | Tue 17.11 | Summary (no slides) | Lars Ailo |
Thu 26.11 | Exam | - |
Readings
All lecture noets are Mandatory, and in addition unless otherwise noted:
- Introduction
- None
- Threads and synchronization primitives (operating systems course recap):
- Modern operating systems, 3ed, Andrew S. Tanenbaum. Prentice Hall. 2007. Chapters: 2.2, 2.3, 2.5, 10.3, 11.4
- Alternative to MOS: another operating systems textbook: the chapters about threading, IPC mechanisms, and classical IPC problems.
- Go
- Rob Pike. SPLASH keynote talk
- A tour of Go
- How to write Go code
- Effective Go
- Go concurrency patterns (video, slides)
- Advanced Go concurrecny patterns (video, slides)
- Parallel architectures
- Computer Organization and Design: the Hardware/Software Interface, 4ed. David A. Patterson, John L. Hennessy. Morgan Kaufmann. 2011. Chapter 8: “Multicores, Multiprocessors, and Clusters”.
- Parallel programs
- None
- Programming for performance
- None
- Performance evaluation
- None
- Parallel performance evaluation
- Data-intensive computing
- “Jim Gray on eScience”, and chapters 1 and 2 from The Fourth Paradigm: Data-Intensive Scientific Discovery. Edited by Tony Hey, Stewart Tansley, and Kristin Tolle. 2010.
- Optional: Google File System paper
- Optional: MapReduce paper
- Optional: Exascale Computing and Big Data
- Scala and Spark
- None
- Spark libraries
- Optional: videos, slides, and research papers at: http://spark.apache.org/documentation.html
The following are suggested additional readings:
- Parallel Computer Architecture: A Hardware/Software Approach. David Culler, J.P. Singh, Anoop Gupta. Morgan Kaufmann. 1998.
- This book has a great introduction to parallel programming.
- There is one copy in the library. Please be nice to your fellow students and do not lend that copy for an extended period.
- The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. R. K. Jain. Wiley. 1991.
- A very good book about performance analysis.
- There is one copy in the library. Please be nice to your fellow students and do not lend that copy for an extended period.
- Computer Architecture, Fifth Edition: A Quantitative Approach, 5ed. John L. Hennessy, David A. Patterson. Morgan Kaufmann. 2011.
- This book has a throughout description of different parallel architectures.
- You can purchase this book from your favorite bookstore.
- The Fourth Paradigm: Data-Intensive Scientific Discovery. Edited by Tony Hey, Stewart Tansley, and Kristin Tolle. 2010.
- This collection of essays describe many of the opportunities and challenges for data-intensive computing in different scientific fields.
- The book is freely available as an ebook.
- Hadoop: The Definitive Guide, 3ed. Tom White. O’Reilly. 2012.
- Nice overview of the Hadoop ecosystem, included detailed description of HDFS and hadoop MapReduce.
- This book is available in the library as a Safari ebook.
- HBase: The Definitive Guide. Lars George. O’Reilly. 2012.
- Detailed description of HBase included tips for tuning the system.
- This book is available in the library as a Safari ebook.
- Learning Spark. Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. O’Reilly. 2015.
- Advanced Analytics with Spark. Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills. O’Reilly. 2015.
Mandatory assignments
Project | Start | Due | Subject | Lecturer |
---|---|---|---|---|
P1 | 18.08.15 | 14.09.15 | Concurrent B+ trees (private repository and public zip) | Ibrahim |
P2 | 17.09.15 | 12.10.15 | Deduplication (private repository and public zip) | Ibrahim |
P3 | 13.10.15 | 06.11.15 | PageRank using Spark on AWS (private repository and public zip ) | Lars Ailo and Ibrahim |
Note! The mandatory assignment text and pre-code are available in private repositories accessible only for the members of the uit-inf-2202 github organization.
Exercises
- Introduction
- None
- Threads and synchronization primitives
- Compare the overhead of forking a process vs. creating a Pthread
- Compare the overhead of forking a process vs. creating a Python thread
- Implement a solution the following classical IPC problems using pthreads/Python threads and semaphores/condition variables. Note that you also need to generate a use case, test data, and useful output:
- Producer/ consumer
- Reader/ writer
- Sleeping barber
- Dining philosophers
- Modify the code in 3) to use message passing.
- Go
- Take a tour of Go
- Implement the classical IPC problems in exercise 2.3. in Go.
- Parallel architectures
- None
- Parallel programs
- Implement a simpliefied BLAST search program in Go that does similarity search on two lists of random DNA sequences.
- Implement a heat distribution program using Pthreads or (/and) Go.
- Programming for performance
- Implement a tuple space in Python with semantics similar to Linda. Use your tuple space to implement a parallel version of Mandelbrot that uses dynamic assignment (pool of tasks).
- Performance evaluation
- None
- Parallel program performance evaluation
- Data-intensive computing
- Create an account at AWS and calculate the approximate cost for analyzing 1TB and 1PB of data.
- Implement word count in MapReduce and run it on uvrocks or AWS.
- Implement grep in MapReduce and run it on uvrocks or AWS.
- Scala and Spark
- Run the provided WordCount in assignment 3 on AWS
- Implement grep in Scala and run it on AWS
- Spark libraries
- Refactor your assignment 3 code to use GraphX
- Stallo
- None
- Summary lecture
- Exam from 2013
- Sample exam (from 2013)