Researcher
Company: Alignment Research Center
Location: Berkeley
Posted on: November 18, 2023
|
|
Job Description:
What is ARC's Theory team?The Alignment Research Center (ARC) is
a non-profit whose mission is to align future machine learning
systems with human interests. The high-level agenda of the team
(not to be confused with the team) is described by the report on
(ELK): roughly speaking, we're trying to design ML training
objectives that incentivize systems to honestly report their
internal beliefs.For the last year or so, we've mostly been focused
on an approach to ELK based on formalizing a kind of heuristic
reasoning that could be used to analyze neural network behavior, as
laid out in our paper on . Our research has reached a stage where
we're coming up against concrete problems in mathematics and
theoretical computer science, and so we're particularly excited
about hiring researchers with relevant background, regardless of
whether they have worked on AI alignment before. See below for
further discussion of ARC's current theoretical research
directions.Who is ARC looking to hire?Compared to our , we have
more of a need for people with a strong theoretical background (in
math, physics or computer science, for example), but we remain open
to anyone who is excited about getting involved in AI alignment,
even if they do not have an existing research record.Ultimately, we
are excited to hire people who could contribute to our research
agenda. The best way to figure out whether you might be able to
contribute is to take a look at some of our recent research
problems and directions:- Some of our research problems are purely
mathematical, such as these - although note that these are
unusually difficult, self-contained and well-posed (making them
more appropriate for prizes).- Some of our other research is more
informal, as described in some of our recent such as .- A lot of
our research occupies a middle ground between fully-formalized
problems and more informal questions, such as fixing the problems
with cumulant propagation described in Appendix D of .What is
working on ARC's Theory team like?ARC's Theory team is led by and
currently has 2 other permanent team members, and , alongside a
varying number of temporary team members (recently anywhere from
0-3).Most of the time, team members work on research problems
independently, with frequent check-ins with their research advisor
(e.g., twice weekly). The problems described above give a rough
indication of the kind of research problems involved, which we
would typically break down into smaller, more manageable
subproblems. This work is often somewhat similar to academic
research in pure math or theoretical computer science.In addition
to this, we also allocate a significant portion of our time to
higher-level questions surrounding research prioritization, which
we often discuss at our weekly group meeting. Since the team is
still small, we are keen for new team members to help with this
process of shaping and defining our research.ARC shares an office
with several other groups working on AI alignment such as , so even
though the Theory team is small, the office is lively with lots of
AI alignment-related discussion.What are ARC's current theoretical
research directions?ARC's main theoretical focus over the last year
or so has been on preparing the paper and on follow-up work to
that. Roughly speaking, we're trying to develop a framework for
"formal heuristic arguments" that can be used to reason about the
behavior of neural networks. This framework can be thought of as a
confluence of two existing approaches:- Mechanistic
interpretability: uncertain and defeasible, but not machine
verifable- Formal proof: machine verifable, but strictly confident
only- Formal heuristic argument (our approach): uncertain and
defeasible and machine verifiableThis research direction can be
framed in a couple of different ways:- As a formalization of
mechanistic interpretability: Mechanistic interpretability is a
research field seeking to the weights of neural networks into
human-understandable programs. A number of the field's central
concepts, such as a "feature", are currently defined informally.
Putting the field onto more of a formal footing could bring clarity
to the methods and goals of the field, remove the need to have
humans or human-like systems in the loop, and elucidate how
interpretability could be applied to solve downstream problems.- As
a way of dealing with out-of-distribution generalization failures:
We think that a formal heuristic argument that explains a neural
network's training set performance could be used to flag new
datapoints that trigger unusual behavior inside the model. We have
been calling this approach "mechanistic anomaly detection", since
it can be thought of as a way to detect anomalies in the model's
internal activations at inference time. Further details are given
in this .Hiring processOur current interview process involves:-
3-hour take-home test involving math and computer science puzzles-
30-minute non-technical phone call- 1-day onsite interviewWe will
compensate candidates for their time when this is logistically
possible.We will keep applications open until at least the end of
August 2023, and will aim to get a final decision back within 6
weeks of receiving an application.Employment detailsARC is based in
Berkeley, California, and we would prefer people who can work
full-time from our office, but we are open to discussing remote or
part-time arrangements in some circumstances. We can sponsor visas
and are H-1B cap-exempt.We are accepting applications for both
visiting researcher (1-3 months) and full-time positions. The
intention of the visiting researcher position is to assess
potential fit for a full-time role, and we expect to invite around
half of visiting researchers to join full-time. We are also able to
offer straight-to-full-time positions, but we anticipate that we
will only be able to do this for people with a legible research
track-record.Salaries are in the $150k-400k range for most people
depending on experience.Further informationIf you have any
questions about anything in this posting, please email .If you want
to provide any feedback, you can use this form:
Keywords: Alignment Research Center, Berkeley , Researcher, Other , Berkeley, California
Click
here to apply!
|