Ocean data assimilation is an effective method to integrate ocean observation data into the ocean numerical model. Assimilated ocean data is closer to the real situation of the ocean, so it is of great significance for human to understand and study the ocean. We design a general parallel implementation method for ocean data assimilation based on the domain decomposition strategy. We further propose a new parallel algorithm based on IO proxy. Firstly, IO proxy processes are in charge of parallel reading of data. Then, they split data into many blocks, and send different blocks to corresponding computation processes. After completion of local data assimilation, IO proxy processes collect local assimilation results from computation processes, and write them into the disk. The main advantage of this parallel method is that IO proxy processes takes charge of IO, rather than allowing all processes to participate in IO (direct parallel IO). This can prevent a large number of processes from accessing the disk simultaneously, thus effectively avoiding the waiting caused by processes queuing. Test results based on Tianhe2 clusters show that, for the assimilation of data with 1degree resolution, when there are 425 cores, the total running time of the proposed parallel implementation is 9.1s, which is nearly 38 times faster than that of traditional serial programs. In addition, for the assimilation of data with 0.1 degree resolution, the parallel assimilation algorithm using IO proxy still has a good scalability on 10,000 cores, and its IO time can be limited to at most 1/9 of the direct parallel IO time.