Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE Linux Enterprise Server 11 SP4

30 File Synchronization

These days, many people use several computers—one computer at home, one or several computers at the workplace, and possibly a laptop, tablet, or a smartphone on the road. Many files are needed on all these computers. You may want to be able to work with all computers and modify the files so that you have the latest version of the data available on all computers.

30.1 Available Data Synchronization Software

Data synchronization is no problem for computers that are permanently linked by means of a fast network. In this case, use a network file system, like NFS, and store the files on a server, enabling all hosts to access the same data via the network. This approach is impossible if the network connection is poor or not permanent. When you are on the road with a laptop, copies of all needed files must be on the local hard disk. However, it is then necessary to synchronize modified files. When you modify a file on one computer, make sure a copy of the file is updated on all other computers. For occasional copies, this can be done manually with scp or rsync. However, if many files are involved, the procedure can be complicated and requires great care to avoid errors, such as overwriting a new file with an old file.

Warning
Warning: Risk of Data Loss

Before you start managing your data with a synchronization system, you should be well acquainted with the program used and test its functionality. A backup is indispensable for important files.

The time-consuming and error-prone task of manually synchronizing data can be avoided by using one of the programs that use various methods to automate this job. The following summaries are merely intended to convey a general understanding of how these programs work and how they can be used. If you plan to use them, read the program documentation.

30.1.1 CVS

CVS, which is mostly used for managing program source versions, offers the possibility of keeping copies of the files on multiple computers. Accordingly, it is also suitable for data synchronization. CVS maintains a central repository on the server in which the files and changes to files are saved. Changes that are performed locally are committed to the repository and can be retrieved from other computers by means of an update. Both procedures must be initiated by the user.

CVS is very resilient to errors when changes occur on several computers. The changes are merged and (if changes took place in the same lines) a conflict is reported. When a conflict occurs, the database remains in a consistent state. The conflict is only visible for resolution on the client host.

30.1.2 rsync

When no version control is needed but large directory structures need to be synchronized over slow network connections, the tool rsync offers well-developed mechanisms for transmitting only changes within files. This not only applies to text files, but also binary files. To detect the differences between files, rsync subdivides the files into blocks and computes checksums over them.

The effort put into the detection of the changes comes at a price. The systems to synchronize should be scaled generously for the usage of rsync. RAM is especially important.

30.2 Determining Factors for Selecting a Program

There are some important factors to consider when deciding which program to use.

30.2.1 Client-Server versus Peer-to-Peer

Two different models are commonly used for distributing data. In the first model, all clients synchronize their files with a central server. The server must be accessible by all clients at least occasionally. This model is used by CVS.

The other possibility is to let all networked hosts synchronize their data between each other as peers. rsync actually works in client mode, but any client can also act as a server.

30.2.2 Portability

CVS and rsync are also available for many other operating systems, including various Unix and Windows systems.

30.2.3 Interactive versus Automatic

In CVS, the data synchronization is started manually by the user. This allows fine control over the data to synchronize and easy conflict handling. However, if the synchronization intervals are too long, conflicts are more likely to occur.

30.2.4 Conflicts: Incidence and Solution

Conflicts only rarely occur in CVS, even when several people work on one large program project. This is because the documents are merged on the basis of individual lines. When a conflict occurs, only one client is affected. Usually conflicts in CVS can easily be resolved.

There is no conflict handling in rsync. The user is responsible for not accidentally overwriting files and manually resolving all possible conflicts. To be on the safe side, a versioning system like RCS can additionally be employed.

30.2.5 Selecting and Adding Files

In CVS, new directories and files must be added explicitly using the command cvs add. This results in greater user control over the files to synchronize. On the other hand, new files are often overlooked, especially when the question marks in the output of cvs update are ignored due to the large number of files.

30.2.6 History

An additional feature of CVS is that old file versions can be reconstructed. A brief editing remark can be inserted for each change and the development of the files can easily be traced later based on the content and the remarks. This is a valuable aid for theses and program texts.

30.2.7 Data Volume and Hard Disk Requirements

A sufficient amount of free space for all distributed data is required on the hard disks of all involved hosts. CVS require additional space for the repository database on the server. The file history is also stored on the server, requiring even more space. When files in text format are changed, only the modified lines need to be saved. Binary files require additional space amounting to the size of the file every time the file is changed.

30.2.8 GUI

Experienced users normally run CVS from the command line. However, graphical user interfaces are available for Linux (such as cervisia) and other operating systems (like wincvs). Many development tools (such as kdevelop) and text editors (such as Emacs) provide support for CVS. The resolution of conflicts is often much easier to perform with these front-ends.

30.2.9 User Friendliness

rsync is rather easy to use and is also suitable for newcomers. CVS is somewhat more difficult to operate. Users should understand the interaction between the repository and local data. Changes to the data should first be merged locally with the repository. This is done with the command cvs update. Then the data must be sent back to the repository with the command cvs commit. Once this procedure has been understood, newcomers are also able to use CVS with ease.

30.2.10 Security against Attacks

During transmission, the data should ideally be protected against interception and manipulation. CVS and rsync can easily be used via ssh (secure shell), providing security against attacks of this kind. Running CVS via rsh (remote shell) should be avoided. Accessing CVS with the pserver mechanism in insecure networks is likewise not advisable.

30.2.11 Protection against Data Loss

CVS has been used by developers for a long time to manage program projects and is extremely stable. Because the development history is saved, CVS even provides protection against certain user errors, such as unintentional deletion of a file.

Table 30.1: Features of the File Synchronization Tools: -- = very poor, - = poor or not available, o = medium, + = good, ++ = excellent, x = available

CVS

rsync

Client/Server

C-S

C-S

Portability

Lin,Un*x,Win

Lin,Un*x,Win

Interactivity

x

x

Speed

o

+

Conflicts

++

o

File Sel.

Sel./file, dir.

Dir.

History

x

-

Hard Disk Space

--

o

GUI

o

-

Difficulty

o

+

Attacks

+ (ssh)

+(ssh)

Data Loss

++

+

30.3 Introduction to CVS

CVS is suitable for synchronization purposes if individual files are edited frequently and are stored in a file format, such as ASCII text or program source text. The use of CVS for synchronizing data in other formats (such as JPEG files) is possible, but leads to large amounts of data, because all variants of a file are stored permanently on the CVS server. In such cases, most of the capabilities of CVS cannot be used. The use of CVS for synchronizing files is only possible if all workstations can access the same server.

30.3.1 Configuring a CVS Server

The server is the host on which all valid files are located, including the latest versions of all files. Any stationary workstation can be used as a server. If possible, the data of the CVS repository should be included in regular backups.

When configuring a CVS server, it might be a good idea to grant users access to the server via SSH. If the user is known to the server as tux and the CVS software is installed on the server as well as on the client, the following environment variables must be set on the client side:

CVS_RSH=ssh CVSROOT=tux@server:/serverdir

The command cvs init can be used to initialize the CVS server from the client side. This needs to be done only once.

Finally, the synchronization must be assigned a name. Select or create a directory on the client to contain files to manage with CVS (the directory can also be empty). The name of the directory is also the name of the synchronization. In this example, the directory is called synchome. Change to this directory and enter the following command to set the synchronization name to synchome:

cvs import synchome tux wilber

Many CVS commands require a comment. For this purpose, CVS starts an editor (the editor defined in the environment variable $EDITOR or vi if no editor was defined). The editor call can be circumvented by entering the comment in advance on the command line, such as in the following example:

cvs import -m 'this is a test' synchome tux wilber

30.3.2 Using CVS

The synchronization repository can now be checked out from all hosts with cvs co synchome. This creates a new subdirectory synchome on the client. To commit your changes to the server, change to the directory synchome (or one of its subdirectories) and enter cvs commit.

By default, all files (including subdirectories) are committed to the server. To commit only individual files or directories, specify them as in cvs commit file1 directory1. New files and directories must be added to the repository with a command like cvs add file1 directory1 before they are committed to the server. Subsequently, commit the newly added files and directories with cvs commit file1 directory1.

If you change to another workstation, check out the synchronization repository if this has not been done during an earlier session at the same workstation.

Start the synchronization with the server with cvs update. Update individual files or directories as in cvs update file1 directory1. To see the difference between the current files and the versions stored on the server, use the command cvs diff or cvs diff file1 directory1. Use cvs -nq update to see which files would be affected by an update.

Here are some of the status symbols displayed during an update:

U

The local version was updated. This affects all files that are provided by the server and missing on the local system.

M

The local version was modified. If there were changes on the server, it was possible to merge the differences in the local copy.

P

The local version was patched with the version on the server.

C

The local file conflicts with current version in the repository.

?

This file does not exist in CVS.

The status M indicates a locally modified file. Either commit the local copy to the server or remove the local file and run the update again. In this case, the missing file is retrieved from the server. If you commit a locally modified file and the file was changed in the same line and committed, you might get a conflict, indicated with C.

In this case, look at the conflict marks (>> and <<) in the file and decide between the two versions. As this can be a rather unpleasant job, you might decide to abandon your changes, delete the local file, and enter cvs up to retrieve the current version from the server.

30.4 Introduction to rsync

rsync is useful when large amounts of data need to be transmitted regularly while not changing too much. This is, for example, often the case when creating backups. Another application concerns staging servers. These are servers that store complete directory trees of Web servers that are regularly mirrored onto a Web server in a DMZ.

30.4.1 Configuration and Operation

rsync can be operated in two different modes. It can be used to archive or copy data. To accomplish this, only a remote shell, like ssh, is required on the target system. However, rsync can also be used as a daemon to provide directories to the network.

The basic mode of operation of rsync does not require any special configuration. rsync directly allows mirroring complete directories onto another system. As an example, the following command creates a backup of the home directory of tux on a backup server named sun:

rsync -baz -e ssh /home/tux/ tux@sun:backup

The following command is used to play the directory back:

rsync -az -e ssh tux@sun:backup /home/tux/

Up to this point, the handling does not differ much from that of a regular copying tool, like scp.

rsync should be operated in rsync mode to make all its features fully available. This is done by starting the rsyncd daemon on one of the systems. Configure it in the file /etc/rsyncd.conf. For example, to make the directory /srv/ftp available with rsync, use the following configuration:

gid = nobody
uid = nobody
read only = true
use chroot = no
transfer logging = true
log format = %h %o %f %l %b
log file = /var/log/rsyncd.log

[FTP]
        path = /srv/ftp
        comment = An Example

Then start rsyncd with rcrsyncd start. rsyncd can also be started automatically during the boot process. Set this up by activating this service in the runlevel editor provided by YaST or by manually entering the command insserv rsyncd. rsyncd can alternatively be started by xinetd. This is, however, only recommended for servers that rarely use rsyncd.

The example also creates a log file listing all connections. This file is stored in /var/log/rsyncd.log.

It is then possible to test the transfer from a client system. Do this with the following command:

rsync -avz sun::FTP

This command lists all files present in the directory /srv/ftp of the server. This request is also logged in the log file /var/log/rsyncd.log. To start an actual transfer, provide a target directory. Use . for the current directory. For example:

rsync -avz sun::FTP .

By default, no files are deleted while synchronizing with rsync. If this should be forced, the additional option --delete must be stated. To ensure that no newer files are deleted, the option --update can be used instead. Any conflicts that arise must be resolved manually.

30.5 For More Information

CVS

Important information about CVS can be found in the home page http://www.cvshome.org.

rsync

Important information about rsync is provided in the man pages man rsync and man rsyncd.conf. A technical reference about the operating principles of rsync is featured in /usr/share/doc/packages/rsync/tech_report.ps. Find the latest news about rsync on the project Web site at http://rsync.samba.org/.

Print this page