I've been trying to do some reading so I can help you out with this problem, QH. Oh man, am I ever feeling swamped right now. I'm reading through Goldstein's "Classical Mechanics" and Dirac's "Principles of Quantum Mechanics", some of the most popular books on this topic, and even then the material feels sorely lacking. I'm still working on it and trying to do my utmost to avoid reading these books from start to finish, but I might just have to do that if I can't find what I'm looking for. Sucks to be away from university at a time like this when I would have so many people I could ask.
Anyhow, I do believe there is a method of deriving momentum as an operator from Hamiltonian mechanics, with the additional assumption of the correspondence principle connecting classical to quantum mechanics. In the formalism of canonical transformations, it's easy to show that using momentum as the generating function of an inifinitesimal canonical transformation produces an infinitesimal translation in coordinates, so they call it the generator of translations. Classically (I'm working in 1 dimension for simplicity), using the Poisson bracket formalism, you get $$f(x+dx)=f(x)+dx\{f,p\}=f(x)+dx\frac{\partial f}{\partial x}$$, which works out as expected. As you probably know, Poisson brackets correspond to quantum commutators with the relation $$[u,v]=i\hbar\{u,v\}$$. However, I can't yet see how to go from working with functions of x to working with positional bra and ket vectors. Dirac's doing some funny stuff taking derivatives of state kets with respect to x, which doesn't make any sense to me whatsoever, so I have to look into it in more detail.
Man, now this is getting me all worked up too, because I've decided recently that now is the time when I finally want to learn all the step-by-step historical foundations of QM. In undergrad we were spoon-fed the Dirac formalism without going over the means of its derivation from classical mechanics, and even the graduate course I took did little to improve on this. I've learned a lot of stuff already, like the Sommerfeld model and how they originally discovered quantum numbers, where Bohr's magic $$\frac{h}{2\pi}$$ comes from, where Heisenberg got his transitional probabilities from, and much more, but I still have such an incredibly long way to go!
I really hope someone here knows how to go from the classical generator of translations to the quantum generator, and I'm quite certain there's a direct connection, but I just can't see it at the moment. If we could just get to that step, and derive $$\mathcal{T}(\vec{dx})=1-\frac{i}{\hbar}\hat{\vec{p}}\cdot\vec{dx}$$, then I could easily show you from that point where the relationship $$\hat{\vec{p}}\mapsto -i\hbar\vec{\nabla}$$ comes from.