Biopharmaceutical manufacturing is a rapidly growing industry with impact in virtually all branches of medicine. Biomanufacturing processes require close monitoring and control, in the presence of complex bio- process dynamics with many interdependent factors, as well as extremely limited data due to the high cost and long duration of experiments. We develop a novel model-based reinforcement learning framework that can achieve human-level control in low-data environments. The model uses a probabilistic knowledge graph to capture causal interdependencies between factors in the underlying stochastic decision process, leveraging information from existing kinetic models from different unit operations while incorporating real-world experimental data. We then present a computationally efficient, provably convergent stochastic gradient method for policy optimization. Validation is conducted on a realistic application with a multi-dimensional, continuous state variable.