Think about versioning data in Pachyderm kind of like versioning code in Git. The primitives are similar:
- Repositories: These are versioned collections of data, similar to having versioned collections of code in Git repositories
- Commits: Data is versioned in Pachyderm by making commits of that data into data repositories
- Branches: These lightweight points to certain commits or sets of commits (for example, master points to the latest HEAD commit)
- Files: Data is versioned at the file level in Pachyderm, and Pachyderm automatically employs strategies, such as de-duplication, to keep your versioned data space efficient
Even though versioning data with Pachyderm feels similar to versioning code with Git, there are some major ...