Hi
Basically it is what you are saying, but keep in mind that when using ratios like 1:1, 1:2, 1:3 a lot of sidebands fall against each other, and as a result what you hear at one frequency is the sum of two sidebands of different amplitudes and opposite polarity, as they come from two different pairs. So now you add harmonics to the fundamental of the modulator, those will add new sidebands that will fall against those generated by the fundamental, so the resulting amplitude is even more complex.
Trying to calculate is going to be really hard, but rule of thumb will work:
As you add harmonics to the modulator (or the carrier) it is as if you had a new FM interaction creating its own sidebands. As a matter of fact those harmonics are always of higher frequencies that the fundamentals, and because of that the new sidebands tend to appear at higher frequencies too. Typically a saw tooth has energy up until quite a few harmonics, and those will generate sidebands "at" their own frequencies, so higher.
If you see the sawtooth or square wave (or whatever wave) in an additive way, it is a lot clearer I think. Each harmonic of the mod "does" its own FM with each harmonic of the carrier, and each one of those interaction has exactly the sound you expect, but the result is the sum of a LOT of those FM interaction falling against each other, so it gets hard to predict where you'll have peaks and valleys in the resulting spectrum.
Something I find interesting is that higher harmonic content for mod or carrier tend to add a lot of higher frequencies sidebands, which tends to get "noisy", but on the other hand when you use non integers ratios like say 1 : 2,37 and use two "rich" waveforms, it tends to kind of hide a little bit the "noisy" side of sidebands repartition due to the inharmonic ratio.