Project Info

The goal of the project is to apply the knowledge you learned in class in a small scale study. The project deliverables are a short proposal outlining your motivation and planned outline, a final report containing methodology and findings, an implementation of your project and a short class presentation. We expect the project to be worked on in groups of two students, we highly encourage you to team up. As always, please familiarize yourself with and respect the high standards of academic integrity the University of Toronto adheres to https://www.academicintegrity.utoronto.ca/.

Several tutorial sessions will be dedicated to the project. Make sure that you attend these and adhere to the guidelines and tips presented. In addition, we strongly encourage discussing your projects on Pizza and with the TAs in the office hours, as well as your classmates. Since projects are individual, you can freely exchange helpful tips amongst yourselves and online. This document will also evolve to reflect frequently asked questions, so make sure to check in regularly.

Project structure

We encourage exploratory and experimental project and provide several potential project ideas. You can also work on your own idea, please check in with us to see whether the idea aligns with our expectations. No matter what you work on, we expect excellence regarding rigor and thoroughness in your investigation and an adherence to scientific standards in writing and presentation.

Possible projects include:

Implementing or formalizing an exciting environment for Reinforcement Learning and solving it with state of the art reinforcement learning algorithms. Your contribution will be identifying, formalizing and implementing a control problem in a simulator of your choice and evaluating how different algorithm and hyperparameter choices influence the performance of your agent in your domain.
Thoroughly investigate choices such as algorithmic variants (DQN, DDQN, PER) on a suite of known domains. Such a project will probably require more computational resources, so make sure up front that you will be able to conclude it. Your main contributions will be a quantitative and qualitative study of the algorithms. Make sure to clearly identify findings, hypothesis and conclusions you draw from your data.
Proposing a novel extension of an existing state-of-the-art algorithm and investigate its performance on a control benchmark. Your contribution would be a clear outline of why and where you think your extension will improve the performance of the underlying algorithm, a thorough quantitative investigation whether it actually helps, and a short conclusion stating you findings and recommendations regarding your extension.

Project proposal

The project proposal should be a short document (1 page max) which summarize the project idea, the motivation for the project and an outline of what you will do to complete the project. You should take care to show that you have a clear target (i.e. you want to implement environment x, you want to build extension y), and explain what steps will be necessary to complete the project.

Note that we will not grade you on whether you achieved everything that was outlined in your proposal, but on the effort and thoroughness of your work. Some ideas fail due to unforseen consequences, in these cases we expect a clera investigation and explanation of causes for the lack of results.

Project proposal structure

Problem Statement (3/10)

This needs to clearly answer the “What” question.
It should be easy to understand for a broad audience in the area (and more specifically, the TAs ;) ).
Clearly outline input/output
State clearly what tasks you are trying to solve or what kinds of algorithms you are proposing.

Motivation and Impact (3/10)

This should answer the “Why” question.
Why do this? Is there a need that you forsee, or does it answer a previously posed question?
And more importantly the “So What”? What changes if your proposed solution actually works?

Intuition (4/10)

This should answer the “How” question.
State clearly how you are going to design your experiments and what techniques you are using, or what components your novel environments will have.

Report structure

Your report should be cleanly written using \LaTeX~or an office program. A \LaTeX~template will be available on the course page.

The report should include the following sections:

Introduction: Briefly introduce your environments and your algorithm, your motivation for choosing these, and an overview of your findings
Algorithm: Describe the background of your algorithm. Make sure to define and include all necessary mathematical concepts and give an overview of the algorithm in pseudocode. In addition, give some background on related work, e.g. earlier algorithms, and later extensions and improvements. Visualize the algorithm or an important part yourself (do not simply reproduce the paper figure) and highlight the changes in the extension. In addition, several algorithms have design choices or different published versions (e.g. target networks in DQN can be updated in a hard or soft fashion). Make sure to point out which extension you are using specifically and give a justification for all design choices. Or use the chance and build relevant experiments to compare them ;).
- If you are implementing a novel task or comparing different algorithms, you should proceed similarly. Describe the background and existing work, introduce all algorithms that you are using to solve the task and describe the design choices you make. For environments, outline the simulators and tools you used and technical challenges that need to be overcome.
Investigation: This should be the main part of your report. Describe your experimental setup, the aspects of the algorithm you focus on (i.e. hyperparameter investigation, algorithmic improvement, etc.) and include relevant figures. If you are investigating returns, you should always perform several independent runs with different random seeds and report both mean and distribution. For example, you might have an algorithm that fails in 10\% of the cases, but performs well in others. Visualize this!
Conclusion: Briefly summarize your findings and give concrete recommendations for others who are interested in the algorithm. We plan to share the reports with the class afterwards (if you give consent to this) so you can all benefit from these recommendations.

It is sometimes the case that you are not able to get a desired level of performance, typically when the environment is too difficult or when the algorithm is not suited for that particular environment. In this case, please provide detailed explanations speculating on the cause of error, with accompanying experiments and data to justify your speculations. While you may still receive full marks, we nonetheless hope that you will be able to obtain some positive results.

We expect your report to be about 5-6 pages. There is no hard minimum or maximum, but please take care to be brief and precise. The length of the report will not impact the grading unless it is missing relevant information or becomes extremely and unnecessarily verbose.

Make sure to include references for all relevant related work and claims in your report. Using a citation tool such as bibtex can greatly help! At the minimum, include citations for your algorithm and the relevant extensions.

Code

We expect a clean and legible code submission as part of your project. Please provide instructions on how to recreate your experiments in form of a README or a jupyter/collab notebook. Clearly structure and comment your code.

If you use public repositories for algorithms, inspiration or utility functions, please point this out in the code at relevant sections and comment on how you adapted the code to your use case or improved on it. All code should be clean, legible and well documented and should come with clear instructions for installation and execution.

Grading

Your project will be graded according to the following guidelines:

10% Code
- Correctness and clarity
90\% Report:
- 5% Formalities: correct citations, clean write up, all relevant parts included
- 10% Introduction: easy to understand, clear motivation, short and on the point
- 20% Algorithm: description correct and easy to follow, includes relevant related work
- 40% Experiments: setup is clear, experiments are meaningful and varied and support conclusions, all findings are clearly pointed out and discussed
- 15% Conclusion: core findings are clear, recommendations are clear and relevant

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search