A couple of recent posts on Ars Technica and TUAW pointed out that Apple is embedding personal information, such as the name and email address of the purchaser, in all of their AAC files (including the DRM-free ones). We got curious, and wondered whether Apple might also be watermarking the underlying audio data in these tracks.
We've found that there isn't a watermark in the compressed audio signal itself, but there are surprisingly huge differences in the encoded files. Much bigger differences than just different tags, or even different signed/encrypted tags.
We compared two DRM-free copies of the track Daftendirekt by Daft Punk. When decoded to PCM/WAV data, both copies produced an identical audio signal (the MD5sum is e40b006497f9b417760ca5015c3fa937). So there is no audio watermark. But one of the .m4a files is almost 360K larger than the other!
We haven't finished examining these differences yet, and we don't have in-house expertise on MPEG codecs, but some of them have an intriguing amount of structure. There's a region (see around offset 0x11470 in the Daft Punk track for example) where the files contain what look like tables with sequential indices but different data in the table.
We'll post again if we learn more about what's going on here. In the mean time, some pure speculation: it may be that large amounts of iTunes library data are present in each file. It's also possible that Apple has found a way to watermark the AAC encoding itself, such that users would need to either crack the watermark or transcode the audio signal in order to produce a file that does not identify them as the source.