Tuesday, 18 October 2011

DTrace and GIT

I have had a few emails about "where is the latest dtrace?" despite
it being posted on every web page I maintain, and provide the lowest common
denominator - tarballs! Such a nice word :-)

People ask, cant I do "git"? Well...simple answer is "no". There
is a unmaintained(?) dtrace github page, but it wasnt set up by me,
but an enthusiastic supporter.

I can understand people either wanting to track changes or make
contributions.

So, I am opening up the conversation to people: Just how badly can
I damage GIT ?!

I have recently started using GIT and automating the commits at home,
but I am lacking an understanding of git and how to cope with complexity.

Heres the deal. In theory I have two main machines - a server,
rarely switched on, but the "master", and my laptop, where I do most
of my work -- manually syncing changes (not just dtrace, but
for CRiSP and other things) across the machines.

I set up git on my master and laptop ($HOME/git) and use symlinks in
my source code dirs so that the git repository is in its own tree.

Previously, I would just create periodic tarballs as snapshots -
which are mostly fine, but not necessarily synchronised to the sync points.

I rsync my laptop/master git repositories - probably a bad thing. Is it?

So, if an external facing git repository is available, what does it
achieve?

Q1: I can sync to the external repository from my internal, and stop doing
tarballs? (Or keep doing both).

Q2: Who can touch the git repo? Presumably whoever I permission, or, is
it a free-for-all?

Q3: Assuming its a trusted circle of people, then how do I sync
from the repo back to my local git repo?

I really want to review what people do and likely not
accept some contributions or recode them to fit in with my "style".

I dont want to be a Linus/Git-meister (but will if need be - if it
helps the greater good).

So, educate me or be gentle with me.

(I am busy adding some new features to CRiSP and fcterm to show
outline grids whilst editing, and when I finish this, I may go back to
Dtrace and start to remember "What was I planning to do next").

Post created by CRiSP v10.0.17a-b6082


6 comments:

  1. I'm still learning git. I've found the Git Community Book (http://book.git-scm.com/) to be very useful and clear for starting.

    Re: rsync my laptop/master git repos
    You probably can do that, but then it may mess up the git information in both repositories. It's better to designate the master as the "pristine" copy and push your changes from the laptop git repo to the master when you are ready. You may keep rsynching, but there would be little need since git will supposedly keep everything in sync in a finer-grained manner.

    Re: Q1
    You can syn in both directions (i.e. from external to internal, or from internal to external). You can stop doing tarballs at that point.

    Re: Q2
    Depends on the permissions. The canonical method is to provide read-only access to the public-at-large so that people can clone your external git repository. Then they work on their own copy, and can make patches available for you to pull at your leisure. Contributors need to notify you (usually email) when they have patches for you to review.


    Re: Q3
    You can give a trusted circle of people write access to your external git repo, in which case once they are done with their modifications in their own private gito repositories, they can push their changes to your external repo when they want.

    Your local git repo is equal to everybody else's own local git repo; you'd have write access to the external and (preferably) only push tested and approved modifications to the external repository.

    In git, all repositories are equal, in the sense that, if for some reason you were knocked out of commission for whatever reason and nobody had access to your master git repository, everybody else can continue on where work was left off by collectively designating a new canonical source to push patches to. Since you already release the dtrace tarballs, people could already do this anyway, but at least they would be able to preserve the git histories and presumably be able to go back and review commits in case there are problems that need to be fixed/backed out.

    Re: reviewing code
    Since people will presumably treat your git repo as the canonical source and collectively agree to push changes there (which isn't any different than today where you provide the tarballs and people have provided patches, issues and feedback to you directly), you can choose which patches to pull into the external dtrace repo.

    People use services like Bitbucket or GitHub because they make the management of patches and issue lists slightly easier to scale from lone-developer to community effort in one location.

    ReplyDelete
  2. Many thanks otoburb - your reply was very useful in convincing me that this is the way forward. I will need to experiment and (re)setup the github area and toy with the sync commands.

    One question I have: is there any milage in recreating my repo, using all the prior tarballs to recreate the history? Or can Day Zero just be the day I open up as part of my release cycle?

    ("Summary" changes are in the "Changes" file - which is a sort of no-SCM/cheap/effective way of reminding me what I did when).

    ReplyDelete
  3. My pleasure Paul!
    There is no mileage to recreate your repo using the previous tarballs. I'm sure that other people would argue differently just so they can have the history, but they can easily see that in the original "Changes" file.

    The benefit of having all of the Changes migrated into the git repo is marginal IMO.

    Better to have the git repo up and running so those people who have been asking in email can start cloning and (hopefully) working on improvements with you.

    ReplyDelete
  4. I just realized that I may be contradicting myself when I discount the revision history up to the latest tarball, when in the first comment I brought it up as a valuable way to diagnose where issues might be in the code ("was it this commit, or this one, that broke X?").

    To be clearer, if/as more people contribute code over a longer period of time, the value of the public revision history goes up. This is mainly because people can use commands like 'git bisect' to interactively cherry pick specific changes to back out (in their own local git repos).

    However, it's highly unlikely that other people will want to back out any of *your* changes prior to Day Zero because the working assumption is that your releases are of a higher quality. This is why I discount the value of incrementally uploading your tarballs, and hypothesize that you're better off just open Dtrace up as part of your release cycle.

    ReplyDelete
  5. Excellent - thanks for the clarification Otoburb. It may take me a while to do this. My laptop just developed a nice thick band (~80pixels high) across the screen and may need to go back to the prior one and resync.

    I got some good comments from someone else (Nigel). Reading the feedback suggests the value is for future changes when lots of people are committing changes. But if no one does commit, then the value/time is less so.

    My other question on the thread is "why git", when mercurial/subversion/perforce/whatever is used by different people. I assume there are bridges from one to the other for all permutations.

    ReplyDelete
  6. Paul:

    I work for Google on Chromium/Chromium OS, and we use git. The typical way it's used is to have an internal git repo and an external R/O one, and changes are intermediated by gerrit: http://code.google.com/p/gerrit/

    This gives you fine grained control over code, as well as code review and the ability to designate yourself for final say on committting change.

    As far as revision history:

    Before Google, I used to work at Apple in the Core OS kernel team (I was one of the people who ported DTrace to Mac OS X, in fact; Hi, fellow DTrace'r!).

    One of the worst practices Apple has is to take secret projects off the grid, then dump them into the main tree all at once. This loses the revision history, which in turn loses the reasons changes were made. The entirety of the Intel port basically came in as a lump, with no reasoning as to why changes were made.

    Barring secret project spam like that... when converting the kernel (xnu) from CVS to SVN, we so hated the big lump dumps that one of our core requirements was that the change recreate the revision history from the CVS repository.

    The conversion basically did a check-in-by-check-in replay of the modifications to the CVS repo into the new SVN repo. The data captured (not lost) in doing this saved tens of thousands of engineering hours that would have been spent backing out intentional workarounds that looked hinky with no context.

    I think you could easily automate unpacking your tarballs in date order and throwing them into the repo, with the comments being whatever change log or readme, etc. data that's changed: whatever you think is relevant.

    Since this could be automated, there would not be a heck of a lot of cost, and you could use the git equivalent of cvs or svn annotate (git blame) to see at least a general reason a particular line of code changed.

    PS: if you open things to contribution, and we can get the working to a point of usefulness for Chromium/Chromium OS (we're going to try this, in any case), I expect that you will get contributors and supporters from within Google.

    ReplyDelete