Probes were selected using non-proprietary rules as described earlier [1]. No mismatch sequences were used in the design of the array. One of the main challenges for the selection of the specific and effective probes was the high A+T content and the low complexity of the P. falciparum genome. Following production and release of the complete P. falciparum genome sequence and annotations in October 2002, we validated the design of the array by mapping probe nucleotide sequences to the sense nucleotide sequence of predicted coding regions. Out of 5409 published coding sequences downloaded October 11, 2002, 18 were duplicates and 11 more were subsequences of others. For example we would not be able to resolve the expression of a number of duplicated genes all sharing the same sequence. The BLAST analysis showed that out of the 5409 predicted coding sequences, 203 were not represented on the array. In some of these the predicted coding sequence was small (48 protein were less than 100 amino acids in length). Some other sequences were repetitive and low complexity. Finally, some nucleotide sequence from chromosome 6 and 7 were unavailable at the time the array was designed.
Reference:
1. K.G.Le Roch, Y.Zhou, S.Batalov, and E.A.Winzeler. Monitoring the chromosome 2 intraerythrocytic transcriptome of Plasmodium falciparum using oligonucleotide arrays. Am.J.Trop.Med.Hyg. 67(3), 2002, pp.233-243.