Here you can see a network train live on some mnist digits. Over time the errors should be approaching 0%.
It will take about 1 minute to get pretty
good at the numbers. I
encourage you to tinker with the code and
see if you can make a better model for this problem.