Sunday, August 25, 2013

How the Korg Poly-800 DCO works

Back when I wrote an article here about how the digitally controlled oscillator (DCO) in the Roland Juno synths (the old analog ones, not the Juno-D series) works.  At the time, I had intended to also write something about how the DCO in the Korg Poly-800 (and its rack-mount sibling, the EX-800) works.  I had heard from several sources that it was quite a different design from the Juno DCO, but at the time I wasn't able to find any solid technical information.  And I don't actually own a Poly-800, so I didn't have a guinea pig to experiment on.  (Years and years ago, I tried one out in a music store, and to be honest I was not that impressed.  But back to the topic.)

Recently, I got interested in the topic again, and after a few days, I managed to finally uncover some documentation.  And yes, as it turns out, the Poly-800 DCO is quite different from the Juno.  It actually flirts with the line between "analog" and "digital" a lot more than the Juno DCO (which has a completely analog audio path) does.  And in some ways, it's more capable than the Juno DCO, but in other ways it's quite limited and just plain screwy.

Schematic Archaeology

A few months ago I went to the excellent FDISKC web site and downloaded a copy of the Poly-800 schematics.  I recall looking at this before and not being able to make much sense of it, and it doesn't help that it's a scanned copy of an original that was already in rather poor condition.  But this time, having read up a bit more on the synth's features and its patch programming options, I had a better idea of what to look for.  For those who have not encountered one: The Poly-800 is an eight-voice synth.  Its DCOs generate outputs in four octaves for each voice.  There are 16', 8', 4', and 2' octaves that can be turned on and off individually.  There are two choices of waveform (or so the synth likes to pretend; we'll talk about this later): square and sawtooth.  No pulse width modulation on the squares, and no triangle or sine.  The normal operating mode is a single DCO per voice, but the 800 can be put in a "double" mode wherein two DCOs are allocated to each voice, the penalty being that the synth is reduced to four voices.

When I looked over the schematics, I noticed an IC with the part number MSM5232.  It had two groups of outputs marked as being the four footages mentioned above.  Aha, I thought, that must be the IC that generates a voice, or possibly two voices.  I got to looking for some notation on the schematic that would explain that that part of the circuit was replicated some number of times (4 or 8 was what I expected), but I couldn't find any such.  Also, the IC looked like it was maybe some sort of processor; it had incoming address and data lines.  And then there were eight lines marked as "C1" through "C8".  I couldn't figure out what those were.  A bit of Web searching quickly uncovered that this part was once upon a time made by Oki Electric.  However, Oki Electric spun off its semiconductor business into a separate company some years ago; I think it may have been through several changes of hands since then, and in any event, Oki Semiconductor, if it still exists, doesn't seem to have a Web site.  So no going to the manufacturer for a data sheet.

I saw several mentions of the Poly-800 service manual having the data sheet, but I only turned up a couple of online sources for the manual, and they looked sketchy (they demanded that you disable your firewall and virus protection in order to download).  So no luck there.  After hours of searching, I finally found a several-years-old posting that had a pointer to an Italian site.  I crossed my fingers and clicked.  It was there!  And it explains a lot.  And now I know...

The Original Chiptunes Synth

The reason I couldn't find any block-replication notation on the schematics was that a single MSM5232 handles all eight DCOs.  As it turns out, the MSM5232 wasn't intended to be used in music synthesizers -- it was a tune chip for arcade video games.  It contains eight counters that divide down a pair of master clock inputs, and bit shifters that act like octave dividers and produce all of the different footages.  It also has a sort-of VCA for each voice, and a pair of onboard attack-sustain-release envelope generators.  What it does not have is filters, a problem that we'll get to later.

So here's how it works: Each DCO is, as stated above, has a counter-divider that is loaded with a value and then counts down every time the clock signal at the external clock input cycles.  When it reaches zero, it sends a reset pulse, and then its value gets reloaded again..  This much is similar to the Juno DCO.  On the Juno, each time the counter reaches zero, the pulse resets a fairly conventional sawtooth VCO core.  However, the 5232 has no VCO core.  Instead, it has a flip-flop that toggles its state on every counter reset -- which means that it is generating a square wave.  That's the only waveform it can produce. 

 Each DCO has a register into which the CPU places a note number when the DCO is to play a note, and a gate flag that turns the voice on and off.  The note number is used to look up a counter value from an internal ROM, which will be used to divide down the incoming clock frequency.  The flip-flop controlled by the counter drives a chain of octave dividers which generate the four footage outputs.  Basically, there is only one octave's worth of counter values, and it taps into the octave divider chain in different places for higher or lower octaves. 

So far so good.  Now here's where it begins to get screwy.  One would think that the logical way to output the voices from the chip would be to have each voice output on its own output pin.  That's not what it does.  The voices are divided into two groups, and for each group, all of the outputs of a given footage are mixed onto one output pin; for example, all of the 16' footages for voices 1 through 4 come out mixed on pin 28.  This answers a big question that is often asked about this synth: why does it use a paraphonic VCF?  Answer: because the 5232 doesn't make the individual voice outputs available.  The IC provides amplitude control over each voice, but not over the individual footages -- they can only be turned on or off, and the choice applies for all voices in a group.  There are two ASR envelope generators onboard, one for each group, but the Poly-800 does not use them.  Rather, it applies envelopes generated externally by the synth's CPU. These are input to the chip through eight input pins, one for each voice.  I don't think the chip really has VCAs -- I think that all it does is toggle back and forth between the current envelope level and ground, which produces the square wave of the desired amplitude. 

Each group of four voices is driven by its own external clock source.  The chip itself has no mechanism for any kind of pitch modulation, so pitch bend and envelope/LFO control over pitch have to be implemented external to the chip, by modulating the master clock frequencies.  Each group of four voices has its own master clock input. This is reflected in the Poly-800's architecture; if you put it in the "double" mode, it divides the two groups and drives them with clocks of different frequencies when detune is selected.

The drawing below shows the basic signal flows.  (To reduce drawing clutter, only 4 of the 8 voices are shown.)  Each voice consists of a note number register, a counter/divider, and four octave dividers.  To play a note, the synth chooses a voice, writes the desired note number into its note number register, and then sets a flag telling the voice to play.   The note number is used to look up the divide-down count in the ROM, which then goes to the counter/divider.  This divides down the master clock (not show) for the group that the voice is in (purple or green) and produces the top octave.  The four octave dividers then produce the four footages. 

 The Mysterious Sawtooth Wave and Alleged Walsh Functions

This leaves a big question: we've established in the MSM5232 is only capable of generating square waves.  But the Poly-800 provides a choice of square or sawtooth waveforms.  How does it do that?  You may have read something about the Poly-800 using a mathematical technique called "Walsh functions" to generate the sawtooth.  What's a Walsh function?  Well, you might know that the process called the "Fourier transform" breaks up a waveform into a set of sine waves that are mixed at different amplitudes.  Walsh functions are like the sine waves used in Fourier analysis: by adding together a set of Walsh functions at different frequencies and amplitudes, you can re-create an arbitrary waveform, within a certain bandwidth.  And that's what the Poly-800 does to approximate a sawtooth wave: it uses the four footages of square wave that the DCO produces to do the inverse Walsh transform equivalent.  When you have the "square" waveform selected for the DCOs, the four square-wave footages are mixed together in equal amounts before the composite signal goes to the filter.  However, when "sawtooth" is selected, the four footages get routed into an analog adder circuit that adds them in a proportion such that the output roughly resembles a sawtooth.  We say "roughly" because trying to do Walsh transforms with only four functions is about like trying to do additive synthesis with only four harmonics.  (Further, it's not true that all of the Walsh functions are square waves; only some of them are, and it takes a more complete set to do a good Walsh transform.)  Nonetheless, it does sort of produce a sawtooth wave.

I've still got a lot more digging to do into the schematic.  For one thing, I'd like to be able to identify how the source oscillator that produces the two clock signals for the 5232 works.  It's obviously not a crystal oscillator since it has to be variable in frequency to an extent.  It appears to be based on an LC-type resonant circuit, but that part of the schematic is in particularly bad shape and it's hard to read.  

Saturday, August 24, 2013

Statescape Wisconsin

So as I wrote in my last post, I've been looking for a while for a way to build a delay line that would allow the recirculating sound to interact with the sound being input in a way other than just being mixed together.  I have wanted to explore other ways in which the input sound could modify the sound looping through the line.  One thing I thought of was to build a delay line in which the input amplitude modulates the recirculating signal.  As you might know, for any pair of sine waves that are input to a form of amplitude modulation, the output will contain sine waves at two new frequencies which are the sum and difference of the frequencies of the two inputs.  If you use more complex waveforms, then each pair of component sines contained within the two input signals will produce sum and differences frequencies, which can produce a whole lot of partials in the output .  Here is a block diagram of what I had in mind:

The first question was how to actually build such a delay line, and for me the answer was obvious: my favorite softsynth-building environment, Csound.  I've coded up a number of delay lines in Csound previously, and the only big change here was to incorporate the amplitude modulation function into the feedback loop.  However, getting that to work the way I wanted proved to be more difficult than I though at first.  To illustrate why, I'll repeat the basic amplitude modulation calculation from my last post:

A = (IG + M) * C

where: M is the modulation signal, C is the carrier signal, IG is the initial gain for the carrier (or, to put it another way, the magnitude of the output when no modulation is present), and A is the amplitude-modulated output.  The problem here is the fact that when you first start up a delay line, it contains no signal.  As you can see in the equation, if the carrier C term is zero, there is no output. So obviously if the AM process is implemented with the delay line feedback as the C term, the sound building process can never get started because no AM output is ever generated. 

So I tried coding it the other way, treating the input signal as the carrier and the delay line feedback as the oscillation.  That solves the problem with the delay initially not containing anything; it gets filled with unmodified input signal until something starts wrapping back out of the line and amplitude-modulating with the input.  However, it creates another problem: there has to be an input signal present all the time.  Whenever there isn't, the AM output, and the signal getting fed back into the delay line, gets "blanked".  And that's bad because I've found that, when doing these long-period delay things, it pays to be sparse with the input; if you are playing notes into it all the time, it quickly gets too busy for the listener to make any sense of it.

I thought about going back to the first way, with the delay line feedback as the carrier, but with a software switch that would route unmodified input signal into the line whenever there was no output from the AM processing.  But what I wound up doing was simpler: I computed the modulation both ways and added the results.  This doesn't effect which frequencies are present in the output, only the relative levels.  For the purpose, I decided it was good enough.  And this had the advantage of not going silent whenever one signal or the other wasn't present.

Once that problem was solved, the next problem was to figure out what kind of input signals would produce interesting results.  I tried some standard synth things like PWM leads and pad sounds, and I found out right away that with those harmonically complex sounds, the results degenerated into a particularly nasty-sounding form of noise very quickly.  So I had to have something harmonically simpler.  For this purpose I chose the Kawai K5m additive synth.  This was sort of overkill, but it worked for the purpose.  I built one basic sound with only a few harmonics, and capable of having its harmonic content varied by use of the mod wheel

I ran into a few problems, including one that I never manged to solve.  The big one was a puzzling popping noise that appears at random times.  I still haven't figured this one out.  Also, I had some problem with subsonics appearing in the output.  To address both of these problems, I added a pair of two-pole Butterworth filters to the algorithm, a low pass and a high pass.  These didn't totally solve the popping problem, which you can still hear in places in the completed track.

As for the results: they were surprisingly musical.  The AM often added notes that I didn't play, and I was pleasantly surprised at how often the added overtones actually worked well with the notes that were played.  Keeping everything harmonically simple helps a lot.  There is a distorted sound that builds up when things get busy; it seems to be characteristic.  All in all, I was fairly pleased.  Now I have to think of what to do for the next delay line.

Listen to Wisconsin here.

And here is the Csound source code for the delay line:

; Basic stereophonic delay line

itimel      = 3.1       ; left channel delay time
itimer      = 4.2       ; right channel delay time
ifbl        = 1.7      ; left channel feedback (keep < 1)
ifbr        = 1.7      ; right channel feedback
kcutlo      init 20.0   ; hi-pass for damping subsonics
kcuthi      init 2500.0 ; low-pass for suppressing pops
imodindex   init 10
kleftch     init 3      ; channel # of left channel (right is assumed +1)

afbl        init 0
afbr        init 0

; Get input audio
ainl, ainr  inch kleftch, kleftch+1

; Scale values to -1..+1 range needed by formula
ainlscaled = ainl / 0dbfs
ainrscaled = ainr / 0dbfs
afblscaled = afbl / 0dbfs
afbrscaled = afbr / 0dbfs

; Compute with feedback as carrier and input as modulation, and rescale
amodinl = (1 + imodindex * ainlscaled) * afblscaled * 0dbfs / 2
amodinl butterhp amodinl, kcutlo
amodinl butterlp amodinl, kcuthi
amodinr = (1 + imodindex * ainrscaled) * afbrscaled * 0dbfs / 2
amodinr butterhp amodinr, kcutlo
amodinr butterlp amodinr, kcuthi

; Compute with input as carrier and feedback as modulation, and rescale
amodfbl = (1 + imodindex * afblscaled) * ainlscaled * 0dbfs / 2
amodfbl butterhp amodfbl, kcutlo
amodfbl butterlp amodfbl, kcuthi
amodfbr = (1 + imodindex * afbrscaled) * ainlscaled * 0dbfs / 2
amodfbr butterhp amodfbr, kcutlo
amodfbr butterlp amodfbr, kcuthi

; Push samples through the left and right delay lines
aoutl       delay amodinl+amodfbl, itimel
aoutr       delay amodinr+amodfbr, itimer

; Output direct + delayed audio
            outch kleftch, (ainl+aoutl)*2
            outch kleftch+1, (ainr+aoutr)*2

; Compute feedback for next cycle
afbl        = aoutl * ifbl
afbr        = aoutr * ifbr


Wednesday, August 7, 2013

Amplitude Modulation

I've got a new Statescape to post this weekend.  It relies heavily on amplitude modulation, as I'll explain in the post when I post it.  However, before I do that, I figured this would be a good time to dig into what amplitude modulation is, how it works, and what can be done with it.

So what is amplitude modulation?  Quite simply, it is what you are doing when you feed an LFO or other signal into the control input of a VCA: the amplitude of one signal (the control signal) is modulating the amplitude of another signal (the audio input to the VCA).  We do this all the time without thinking about it as "AM" as such.  However, most of the time, when we do this we are using very slow control signals -- well below audio frequencies.  Because of this, we don't usually hear the spectral artifacts that AM creates.  If we hear them at all, we hear them as a beating or phasing effect rather than as a separate tone.

However, we can use an audio-frequency signal as the carrier.  When we do, we find that we no longer hear the modulation as a variation of the output level of the carrier; what we hear instead is the carrier with the addition of "sideband" tones generated by the AM process.  Consider the simple case where the carrier and modulation are both sine waves.  What will be heard as the output of the AM process are three tones: a tone at the carrier frequency, and two sideband tones having frequencies which are, if the carrier frequency is CF and the modulation frequency is MF:


So if the carrier frequency is, say, 500 Hz, and the modulation frequency is 220 Hz, the two added tones will come out at 280 Hz and 720 Hz.  Obviously, these frequencies are not harmonically related to the carrier signal or to each other.  Such will usually be the case with AM; the generated tones will be inharmonic more often than not.  The audible effect is to produce sounds that are often described as bell-like, percussive, noisy, or just plain weird.  If the carrier and/or modulation are more complex signals with many harmonic overtones, each harmonic of the carrier will play off of each harmonic of the modulation and generate a pair of sideband tones.  The result becomes cluttered pretty quickly, which is why, when playing with AM, it is often better to start with harmonically simpler signals

(What, you might ask, happens if the carrier frequency is 220 Hz and the modulation is 500 Hz?  Well, the "negative frequency" values become aliased -- they come out as real tones, but with opposite phase.  In this example, we'd get a "real" frequency of 720 Hz and a "negative" frequency of -280 Hz.  The 280 Hz sideband will in fact be there, but it will have the opposite phase that it would have in the first example.)

In a conventional AM setup (as would be used by a radio station broadcasting an AM signal), an initial gain is assigned to the carrier and the modulation varies this gain by being added to or subtracted from it.  The sum or difference of the modulation and the initial gain is what modulates the carrier.  The effect of this is to set the output level of the carrier when there is no modulation.  The instantaneous value of the modulation increases or decreases the initial gain, depending on how the modulation wiring is set up.  Ring modulation is actually just a special case of amplitude modulation, in which the initial gain of the carrier is zero.  Those who have played with a proper ring modulator (one that has both carrier and modulation inputs) know that if you don't put anything into the modulation input, you get nothing out.  This is why.

The basic amplitude modulation equation is:

A = (IG + M) * C

where: M is the modulation signal, C is the carrier signal, IG is the initial gain for the carrier (or, to put it another way, the magnitude of the output when no modulation is present), and A is the amplitude-modulated output.   The multiplying of the carrier and modulation signals is a characteristic of all amplitude modulation methods.  Don't confuse this with the effect in the frequency domain (where the frequencies are added, as discussed above); in the time domain, the signals multiply.  As you can see, if the initial gain is zero, the computation reduces to a straight multiplication of the two signals, which is what ring modulation is.  You can also see another characteristic of ring modulation: the carrier and the modulation are interchangeable; switching the two inputs of a proper ring modulator will produce the same result. 

Amplitude modulation can be easily accomplished in both the analog and digital domains.  In the digital world, if you have access to something that allows you to run formulas on samples, like Csound or Max/MSP, it's pretty easy as shown by the above equation.  In the analog domain, you need a "four quadrant" VCA or a ring modulator.  With the latter, you can set the initial gain (if desired) by using a voltage source and adding it to the modulation with a DC-enabled mixer.  (Note: This may not work with a diode-ring-type ring modulation circuit.  I don't have one to try it with, so I don't know.  It should work with most any four-quadrant VCA.)  Because of the creation of the inharmonic sideband tones, you want to keep the signals you use harmonically simple, because complex waveforms tend to deteriorate into undifferentiated noise pretty quickly.  Also be prepared to do some low-pass filtering to get rid of any excessively high frequency tones that are generated.