BerkeleyRecruiter Since 2001
the smart solution for Berkeley jobs


Company: Alignment Research Center
Location: Berkeley
Posted on: November 18, 2023

Job Description:

What is ARC's Theory team?The Alignment Research Center (ARC) is a non-profit whose mission is to align future machine learning systems with human interests. The high-level agenda of the team (not to be confused with the team) is described by the report on (ELK): roughly speaking, we're trying to design ML training objectives that incentivize systems to honestly report their internal beliefs.For the last year or so, we've mostly been focused on an approach to ELK based on formalizing a kind of heuristic reasoning that could be used to analyze neural network behavior, as laid out in our paper on . Our research has reached a stage where we're coming up against concrete problems in mathematics and theoretical computer science, and so we're particularly excited about hiring researchers with relevant background, regardless of whether they have worked on AI alignment before. See below for further discussion of ARC's current theoretical research directions.Who is ARC looking to hire?Compared to our , we have more of a need for people with a strong theoretical background (in math, physics or computer science, for example), but we remain open to anyone who is excited about getting involved in AI alignment, even if they do not have an existing research record.Ultimately, we are excited to hire people who could contribute to our research agenda. The best way to figure out whether you might be able to contribute is to take a look at some of our recent research problems and directions:- Some of our research problems are purely mathematical, such as these - although note that these are unusually difficult, self-contained and well-posed (making them more appropriate for prizes).- Some of our other research is more informal, as described in some of our recent such as .- A lot of our research occupies a middle ground between fully-formalized problems and more informal questions, such as fixing the problems with cumulant propagation described in Appendix D of .What is working on ARC's Theory team like?ARC's Theory team is led by and currently has 2 other permanent team members, and , alongside a varying number of temporary team members (recently anywhere from 0-3).Most of the time, team members work on research problems independently, with frequent check-ins with their research advisor (e.g., twice weekly). The problems described above give a rough indication of the kind of research problems involved, which we would typically break down into smaller, more manageable subproblems. This work is often somewhat similar to academic research in pure math or theoretical computer science.In addition to this, we also allocate a significant portion of our time to higher-level questions surrounding research prioritization, which we often discuss at our weekly group meeting. Since the team is still small, we are keen for new team members to help with this process of shaping and defining our research.ARC shares an office with several other groups working on AI alignment such as , so even though the Theory team is small, the office is lively with lots of AI alignment-related discussion.What are ARC's current theoretical research directions?ARC's main theoretical focus over the last year or so has been on preparing the paper and on follow-up work to that. Roughly speaking, we're trying to develop a framework for "formal heuristic arguments" that can be used to reason about the behavior of neural networks. This framework can be thought of as a confluence of two existing approaches:- Mechanistic interpretability: uncertain and defeasible, but not machine verifable- Formal proof: machine verifable, but strictly confident only- Formal heuristic argument (our approach): uncertain and defeasible and machine verifiableThis research direction can be framed in a couple of different ways:- As a formalization of mechanistic interpretability: Mechanistic interpretability is a research field seeking to the weights of neural networks into human-understandable programs. A number of the field's central concepts, such as a "feature", are currently defined informally. Putting the field onto more of a formal footing could bring clarity to the methods and goals of the field, remove the need to have humans or human-like systems in the loop, and elucidate how interpretability could be applied to solve downstream problems.- As a way of dealing with out-of-distribution generalization failures: We think that a formal heuristic argument that explains a neural network's training set performance could be used to flag new datapoints that trigger unusual behavior inside the model. We have been calling this approach "mechanistic anomaly detection", since it can be thought of as a way to detect anomalies in the model's internal activations at inference time. Further details are given in this .Hiring processOur current interview process involves:- 3-hour take-home test involving math and computer science puzzles- 30-minute non-technical phone call- 1-day onsite interviewWe will compensate candidates for their time when this is logistically possible.We will keep applications open until at least the end of August 2023, and will aim to get a final decision back within 6 weeks of receiving an application.Employment detailsARC is based in Berkeley, California, and we would prefer people who can work full-time from our office, but we are open to discussing remote or part-time arrangements in some circumstances. We can sponsor visas and are H-1B cap-exempt.We are accepting applications for both visiting researcher (1-3 months) and full-time positions. The intention of the visiting researcher position is to assess potential fit for a full-time role, and we expect to invite around half of visiting researchers to join full-time. We are also able to offer straight-to-full-time positions, but we anticipate that we will only be able to do this for people with a legible research track-record.Salaries are in the $150k-400k range for most people depending on experience.Further informationIf you have any questions about anything in this posting, please email .If you want to provide any feedback, you can use this form:

Keywords: Alignment Research Center, Berkeley , Researcher, Other , Berkeley, California

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category

Log In or Create An Account

Get the latest California jobs by following @recnetCA on Twitter!

Berkeley RSS job feeds