To get your environment set up, build the kernel (https://github.com/anholt/linux.git vc4 branch), Mesa (git://anongit.freedesktop.org/mesa/mesa) with --with-gallium-drivers=vc4, and piglit (git://anongit.freedesktop.org/git/piglit). For working on the Pi, I highly recommend having a serial cable and doing NFS root so that you don't have to write things to slow, unreliable SD cards.
You can run an existing piglit test that should work, to check your environment: env PIGLIT_PLATFORM=gbm VC4_DEBUG=qir ./bin/shader_runner tests/shaders/glsl-algebraic-add-add-1.shader_test -auto -fbo -- you should see a dump of the IR for this shader, and a pass report. The kernel will make some noise about how it's rendered a frame.
Now the actual work: I've left some of the TGSI opcodes unfinished (SCS, DST, DPH, and XPD, for example), so the driver just aborts when a shader tries to use them. How they work is described in src/gallium/docs/source/tgsi.rst. The TGSI-to_QIR code is in vc4_program.c (where you'll find all the opcodes that are implemented currently), and vc4_qir.h has all the opcodes that are available to you and helpers for generating them. Once it's in QIR (which I think should have all the opcodes you need for this work), vc4_qpu_emit.c will turn the QIR into actual QPU code like you find described in the chip specs.
You can dump the shaders being generated by the driver using VC4_DEBUG=tgsi,qir,qpu in the environment (that gets you 3/4 stages of code dumped -- at times you might want some subset of that just to quiet things down).
Since we've still got a lot of GPU hangs, and I don't have reset wokring, you can't even complete a piglit run to find all the problems or to test your changes to see if your changes are good. What I can offer currently is that you could run PIGLIT_PLATFORM=gbm VC4_DEBUG=norast ./piglit-run.py tests/quick.py results/vc4-norast; piglit-summary-html.py --overwrite summary/mysum results/vc4-norast will get you a list of all the tests (which mostly failed, since we didn't render anything), some of which will have assertion failed. Now that you have which tests were assertion failing from the opcode you worked on, you can run them manually, like PIGLIT_PLATFORM=gbm /home/anholt/src/piglit/bin/shader_runner /home/anholt/src/piglit/generated_tests/spec/glsl-1.10/execution/built-in-functions/vs-asin-vec4.shader_test -auto (copy-and-pasted from the results) or PIGLIT_PLATFORM=gbm PIGLIT_TEST="XPD test 2 (same src and dst arg)" ./bin/glean -o -v -v -v -t +vertProg1 --quick (also copy and pasted from the results, but note that you need the other env var for glean to pick out the subtest to run).
Other things you might want eventually: I do my development using cross-builds instead of on the Pi, install to a prefix in my homedir, then rsync that into my NFS root and use LD_LIBRARY_PATH/LIBGL_DRIVERS_PATH on the Pi to point my tests at the driver in the homedir prefix. Cross-builds were a *huge* pain to set up (debian's multiarch doesn't ship the .so symlink with the libary, and the -dev packages that do install them don't install simultaneously for multiple arches), but it's worth it in the end. If you look into cross-build, what I'm using is rpi-tools/arm-bcm2708/gcc-linaro-arm-linux-gnueabihf-raspbian-x64/bin/arm-linux-gnueabihf-gcc and you'll want --enable-malloc0returnsnull if you cross-build a bunch of X-related packages.