A program fails. Under which circumstances does the failure occur? Our \textsc{Alhazen}\xspace approach starts with a run that exhibits a particular behavior and automatically determines input features associated with the behavior in question: (1) We use a \emph{grammar} to parse the input into individual elements. (2) We use a decision tree learner to \emph{observe} and \emph{learn} which input elements are associated with the behavior in question. (3) We use the grammar to \emph{generate additional inputs} to further strengthen or refute hypotheses as learned associations. (4) By repeating steps 2~and~3, we obtain a \emph{theory} that explains and predicts the given behavior. In our evaluation using inputs for \texttt{find}, \texttt{grep}, \texttt{NetHack}, and a JavaScript transpiler, the theories produced by \textsc{Alhazen} \emph{predict} and \emph{produce} failures with high accuracy and allow developers to \emph{focus} on a small set of input features: ``\texttt{grep} fails whenever the \texttt{–fixed-strings} option is used in conjunction with an empty search string.''