PPIG'07 Full paperACodingSchemeDevelopmentMethodologyUsing
GroundedTheory
forQualitativeAnalysisofPairProgramming
StephanSalinger,LauraPlonka,andLutzPrechelt
FreieUniversit¨atBerlin,Institutf¨urInformatik,
Takustr.9,14195Berlin,Germany
salinger,plonka,prechelt@inf.fu-berlin.de
Abstract.Sinceanumberofquantitativestudiesofpairprogramming(theprac-ticeoftwoprogrammersworkingtogetherusingjustonecomputer)havepro-ducedsomewhatconflictingresults,anumberofresearchershavestartedtostudypairprogrammingqualitatively.Whilemostsuchstudiesusecodingschemesthatarefullyorpartiallypredefined,wehavedecidedtogothelongwayanduseGroundedTheory(GT)togroundeachandeverystatementwemakedirectlyinobservations.
Thefirstintermediategoal,whichwetalkabouthere,wastoproduceacodingschemethatwouldallowtheobjectiveconceptualdescriptionofspecificpairprogrammingsessionsindependentofaparticularresearchgoal.
ThepresentarticleexplainshowourinitialattemptsatusingthemethodofGroundedTheoryfailedandwhichpracticeswedevelopedtoavoidthesedifficulties:pre-determinedperspectiveonthedata,conceptnamingrules,analysisresultsmeta-model,andpaircoding.WeexpectthesepracticesbehelpfulinallGTsituations,inparticularthoseinvolvingveryrichdatasuchasvideodata.
Weillustratetheoperationandusefulnessofthesepracticesbyrealexamplesderivedfromourcodingworkandalsopresentafewpreliminaryhypothesesregardingpairprogrammingthatwehavestumbledacross.
1Introduction
Duringthelastfewyears,pairprogramming,asitisknownfromextremeprogram-ming[1],hasbeenthesubjectofmanyempiricalinvestigations.Thisresearchfocussedmainlyonthemeasurementofbottomlinepairprogrammingeffects,whereastheun-derlyingprocessofpairprogramminghasbeenregardedasakindofblackbox,theoutputofwhichisanalyzedquantitativelywithrespecttoitsperformance,errorrate,programmersatisfactionetc.
Unfortunately,theresultsofthisresearchareoftencontradictory.Forinstancere-gardingtotaleffort,Williamsfoundthatpairprogrammingresultsina15%increasecomparedtosoloprogramming[2],LuiandChanfound21%[3],andNawrockietal.found48%[4].Mostlikelythesedifferencesarecausedbydifferencesinmoderatorvariablessuchasprogrammerandpairexperience,typeoftasketc.,butneitherdoweknowthecompletesetofrelevantmoderatorvariablesnorthenatureandmechanismoftheirinfluence.
144Ourgoalassoftwareengineeringresearchersistounderstandpairprogramminginsuchawaythatwecanadvisepractitionershowtouseitmostefficiently.
Weproposethattheonlywaytoobtainsuchunderstandingistounderstandthemechanismsatworkintheactualpairprogrammingprocess.Obviously,thisunder-standingmustfirstbegainedinqualitativeformbeforewecanstartquantifying,andsincewedonotknowmuchyet,theinvestigationhastostartinanexploratoryfashion.WehavestartedsuchaninvestigationbasedontheGroundedTheory(GT)method-ology[5]andworkingfromrichsetsofdata(full-lengthaudio,programmervideo,andscreenvideoofpairprogrammingsessions).Thepresentpaperpresentsanumberofimportantmethodologicalinsightsgainedduringthisresearchandafewinitialresults.Itscontributionsarethefollowing:
–adescriptionofstumblingblocksforaGT-basedanalysisinthisarea;
–asetofpracticesthatextendtheplainGTmethodandhelpovercomingtheseob-stacles;
–asketchofapairprogrammingprocesscodingscheme.
Insubsequentresearch,thecodingschemeissupposedtoformthebasisformoredetailedconceptualdescriptionsofthepairprogrammingprocessandalsotosupportthepropositionofhypothesesandtheoryconstruction.
WewillfirstgiveashortintroductiontoGroundedTheory(Section2)anddescribethenatureandoriginofourrawdata(Section3).TheheartofthepaperdescribeshowandwhyplainGTdoesnotworkwellundertheseconstraints(Section4)andwhichpracticeshelptomakeitworkbetter(Section5).Section6presentstheapplicationofthemodifiedGTprocessandafewofitsinitialresults,namelyexcerptsofacodingschemefordescribingtheactivitiesoccuringduringpairprogramming.Weclosebyoutliningrelatedworks(Section7)andofferingasummaryandoutlook(Section8).Thepaperfocusesonresearchmethod,notonresearchresults.Theresultsmostlyservetoillustratethemethod.
2TheGroundedTheorymethodology
Asmentionedabove,theinitialanalysisofpairprogramminghastobeexploratory.Inordertobeasopenaspossiblewithrespecttothenatureandcontentoftheresults,wepickGroundedTheoryasouranalysisapproach.
GT,firstdescribedin[6],isadataanalysisapproachthatislargelydata-driven(i.e.useshardlyanypriorassumptionsnorpre-definedterminology)andaimsatproducingatheorythatdescribesinterestingrelationshipsbetweenthings,situations,events,andactivities(togethercalledphenomena)reflectedinthedatabymeansofabstractcon-cepts.Thetermgroundedindicatesthatthistheorywillcontainonlystatementsderivedfromactualobservationsinamannerthatcanbetracedbacktothesedata—thetheoryisgroundedinthedata.
WeusethevariantofGTdescribedbyStraussandCorbin[5],whosuggestthree(partiallyparallel)activitiesforaGT-baseddataanalysis:
1.Opencodingdescribesthedatabymeansofconceptual(ratherthanmerelyde-scriptive)codes,whicharederiveddirectlyfromthedata.
1452.Axialcodingidentifiesrelationshipsbetweentheconceptsdescribedbythesecodes.StraussandCorbinsuggestaconcretesetofrelationshipstocheckfor(inpartic-ular:causalconditionsleadtophenomenawhichexistinacontextfeaturingin-terveningconditionsandleadingtoparticipant’sstrategieswhichcreatecertainconsequences).Theserelationships(plustheslightlyfuzzynotionofformingcate-gories)theycallparadigmaticmodel,atermwewilluseafewtimesfurtherbelow.3.Selectivecodingextractsasubsetoftheconceptsandrelationshipsthusfoundandformulatesthemintoacoherenttheory.Selectivecodingisnotrelevantforthedevelopmentofacodingschemeandwillnotbediscussedinthepresentarticle.StraussconsideredthefollowingthreeaspectstobethecoreoftheGTmethod,saying“Whenyoudoallofthese,thenitisGroundedTheory,ifyoudonot,thenitissomethingelse”[7]:
–Theoreticalcoding:Codesaretheoretical,notjustdescriptive;theyreflectconceptswhichhavepotentialexplanatoryvalueforthephenomenadescribed.
–Theoreticalsampling:Theselectionofthematerialtobeanalyzedismadeincre-mentallyinthecourseoftheanalysis,basedonwhatisexpectedtobemostrelevantforthetheoryunderdevelopment.
–Constantcomparison:Observedphenomena(andtheircontexts)arecomparedmanytimesinordertocreatecodesthatarepreciseandconsistent.Theoreticalsamplingisoflessinterestinthepresentarticle,buttheoreticalcodingandconstantcomparisonareofvitalimportancetounderstandthediscussion.
3Datausedfortheanalysisofpairprogramming
Inthefollowing,wedescribeourobservationcontext(programmersandtask)andthedatacapturingmethodused.3.1
Observationcontext:Theoriginofourdata
Weobserved(inthemannerdescribedbelow)sevenpairsofgraduatestudentswhoallworkedonthesametask.Sixofthemhadworkedtogetheraspairspreviously.Theaver-ageworktime(whichwasnotlimited)was3.8hours.ThestudentswereallparticipantsofahighlytechnicalcourseonenterpriseinformationsystemsandtheJava2EnterpriseEdition(J2EE)architectureandtechnologies.Thespecifictaskcalledforanextensionofanexistingwebshopapplication.ThetaskrequiredbroadpassiveJ2EEknowledgeforanalyzingandunderstandingtheexistingsystemandspecificoperationalknowledgeaboutJMS,JNDI,andtheJBossapplicationserverforprogramming,configuring,andtestingtheactualextension.Thetaskwasnon-trivialsothatonlythreeofthepairswerecompletelysuccessful.
Fortheanalysisdescribedinthepresentarticle,weusedthesessionofoneofthesuccessfulpairsonly;itis2hoursand58minuteslong.
1463.2Observationmethod:Datacapturingprocedure
Sincewedonotknowinadvancewhatwillbeimportantandwhatwillnot,weneedtostartfromaratherrichdataset.Weusethreedifferentdatasources:
–Audiorecordingcapturesverbalcommunicationamongtheparticipantsaswellasothernoises,vocalorother,thatmayhelpwiththeinterpretationoftheremainingdata.
–Frontal-perspectivevideooftheprogrammers(shotfromabove-behindthescreenandreachingdowntoaboutwaistlevel)capturesaspectsoffacialexpression,ges-tures,posture,directionofattention,and—mostrelevantly—whoiscurrentlyoperatingmouseandkeyboard.
–Full-resolutionscreenrecordingcapturesalmostallcomputeractivitiesofthepro-grammersonafairlyfine-grainedlevel.
AllthreerecordingsaremadeatonceusingCamtasiaStudio[8]andunifiedintoasingle,fullysynchronizedvideofileinwhichthecameravideoissuperimposedsemi-transparentlyontoacornerofthescreenvideosothatallinformationisvisibleatonce(multi-dimensionalvideo).
Thesessionwasrecordedinanotherwisesilentoffice.CombinedwiththehighaudioqualityoftheLogitech5000webcam,thisprovidesgoodacousticalplaybackconditions.
4ProblemsofaplainGroundedTheorydataanalysisapproach
AttemptingGT-styleexploratoryanalysisoftherichdatasetdescribedabove1,wequicklyrecognizedthattranscriptionwasnotpractical.Toomuchrelevantinforma-tionisfoundinthescreenrecordingforwhichitisnotobvioushowtotranscribeitatall,nottospeakoftheeffortfordoingso:sourcecodefragmentinput,usingfeaturesofthedevelopmentenvironment(suchasbrowsingacrossdifferentfilesorpositionswithinfiles),pointingwiththemouseduringdiscussionwiththepartner,etc.
ThisiswhywedecidedtoworkontherawvideodirectlyandchosethequalitativedataanalysissoftwareATLAS.ti[9]fordoingso,whichisoneofthefewproductsthatallowscreatingdirectannotationstovideo.
Oneofus,StephanSalinger,startedopencodinginthemannersuggestedbyStraussandCorbin.Theshort-termgoalwastocharacterizetheactivitiesoccuringduringpairprogramming,thelong-termgoalwastoidentifyrecurringbehavioralpatternsandclas-sifythemashelpful,hampering,ambivalent,orneutral.
Thisapproachgeneratednofewerthan194differentconceptsandalmostcompleteconfusionanddespairinthecourseofafewdaysofanalysisduetothefollowingproblems:
–Nopredefinedfocus:Wehadnocriteriaforselectingwhich(kindsof)observationstocodeandwhichtoignore(codeverbalinteraction?factialexpressions?gestures?posture?directionsofgaze?sub-verbalvocalnoises?nervoustics?computerinput?
1
Actuallyaprecursor,butverysimilarinallrespects.
147–
–
–––
inputmethods?computeroutput?andsoon)andconsequentlywereoverwhelmedbythedata.
Nopredefinedgranularity:Wehadnopriordecisiononthelevelofdetailthatwouldbeworthcoding.Asaresult,weproducedcodesondifferentlevelsofdetail(say,coarseonessuchashandleproblemandfineronessuchastestdefectfix),whichwheredifficulttodelineateagainstoneanothersubsequently.
Nopredefinedlevelofacceptablesubjectivity:ThenatureofthecodeschoseninGTcanbeanywhereonthespectrumrangingfromcodesthatstickcloselytoobser-vationsthatanyobserverwouldagreewithtocodesthatinterprettheobservationtoadegreethattheymustbecalledwishfulthinking.GTassuchdoesnotprovideacriterionfordecidingwhere“groundedindata”endsandwishfulthinkingbe-gins.Asaconsequence,wemixedobjective-descriptiveandsubjective-evaluativeattitudesforselectingcodes.Thisledtocodesofdifferentnature(say,descrip-tiveonessuchasusesdocumentationandassumption-bearingonessuchasgainsknowledgeofdetail)existingside-by-side,whichmadeithardertodecidewhichonetouseinaparticularcase.
Toomanytopics:Thecodesdescribedtoomanydifferenttopicsofinterest,makingitimpossibletoproperlyfocusonanything.Noneofthevariousresultingcollec-tionsofinformationeverreachedausefuldegreeofcompleteness.
Lackofconceptgrouping:ThediversityoftopicsalsodistractedfromformingwhatGTcallscategories:afewlargegroupsofheavilyinterrelatedconcepts(say,“Human-humaninteraction”,HHI,and“Human-computerinteraction”,HCI)
Importancemisjudgments:Thehighattentiontoabroadsetofconceptsovertaxedourabilitytojudgetheirimportancesothatbecauseofthelargenumberofconceptsweintroduced,wecompletelyoverlookedanumberofimportantones.
Afterwehadnoticedandgraduallyunderstoodanumberoftheseproblems,westoppedthismodeofinvestigationcompletely.Westartedthewholeanalysisagainfromscratch(butveryslowlyandcarefully,withalotofbacktracking)andconcurrentlyredesignedthecodingprocedure.TheresultofthisredesignwereanumberofheuristicpracticesdescribedbelowthathelpusingtheGTanalysisprocess.
5Practicessupportingtheanalysisofcomplexvideodata
Themethodologicalheuristicspresentedhereformtheheartofthepresentarticle.Theseintertwinedpracticesservetoreduceorsolvetheproblemsdescribedintheprevioussection.Section6willpresentanapplicationofthepracticesthatalsoshowshowtheyworktogetherandmutuallysupportoneanother.5.1
Practice1:Perspectiveonthedata
StraussandCorbinsuggestthatthestartofselectivecoding(thatis,afteropencodingandaxialcodinghavebeengoingonforquitesometime)isthetimewhenyoushouldbegintodecidewhatisimportantandwhatislessso.Asdescribedabove,wefoundthatthisisnotagoodideawhenworkingwithrichvideodata.Therearethreereasonswhyaperspectiveusedfortheanalysisshouldbedefinedbeforestarting:
148–Toavoiddrowningindetail;
–toprovideconstancyinthecriteriausedforcreatingandassigningconcepts;–tofocusattentiononthemostrelevantaspects.
Thisperspectivecanbedefinedbyformulatinganswerstothefollowingquestions.Theseanswersshouldbereviewed(andperhapsrevised)severaltimesinthecourseoftheanalysis:
1.Inwhichrespectsdoyouexpectthedatatoprovideinsight?
2.Whatkindsofphenomenadotheresearchersallowthemselvestoidentifyinthedata?
3.Whattypeofresultdoyouwanttheanalysistobringforth?
Question1doesnotaskwhatyouexpecttofind,onlyinwhatrespectsyouexpecttofindsomething.Theansweractsasafilterthattellsyouwhichphenomenashouldreceivemoreattentionthanothers.Furthermore,constantlyre-checkingandadjustingtheanswertothisquestionhelpsdecidingwhentostoptheanalysis,whentomodify(orthrowoverboard)yourresearchquestion,andwhentoobtainfurtherordifferentrawdata.
Inourcase,theexpectationwasthatthedatacouldhelpunderstandwhatactivitiesdominatethepairprogrammingprocessandhowtheyrelate.
Answer2providesthemechanismforsystematicallyboundingthenatureandamountofsubjectivitytobefoundintheconceptualizationsofthedata.Thestrongestrestrictionwouldbetoallowonlyconceptsthatexpressdirectlyobservablephenomena,result-inginabehaviorist(stimulus/response)researchperspective.Weakerrestrictionsmightalsoallowconceptsreferingtounobservableprocesses(suchasattitudesorthinkingprocessesofactors),conceptsthatinvolvepredictions(suchas“helpfulforreachinggoalX”),and/orconceptsexpressingmoraljudgement(good,bad).
Wewereconvincedthatinourcaseonlythebehavioristperspectivewouldenableustotrustourownresults.
Finally,theresulttypeisthestandardusedfordecidinghowmuchattentiontoinvestinwhichkindsofphenomenawhentheanalysisresourcesbegintogetscarce(whichveryquicklytheywill).Ithelpstostayontrack.Dowewanttoproduceafullconceptualtheory?Orjustaconceptualstructure(systemofcategories)forthedata?Orevenjustacodingscheme?
Inourcase,thegoalwasjusttoproduceacodingscheme,becausewefeltweknewsolittleabouttheinternalsofpairprogrammingthatweshouldnotyetdecideonanactualengineeringresearchquestion.5.2
Practice2:Conceptnamesyntaxrules
ChoosingthenamesofconceptsisanotherareawherewefoundthatgivingupsomeofthefreedompostulatedbyplainGTisbeneficial,becauseourfreelychosenconceptnamesturnedouttobehighlyvariableandhencedifficulttounderstand,remember,andcompare.
Asaremedy,wedevelopedastructurednamingschemeasdescribedbelow.Withintheconfineswesetourselvesbypractice1,thatis,describingdirectlyobservableactiv-itiesofthepairprogrammers,theschemedoesnotpredetermineanythingwithrespect
149tothemeaningofaconcept,itonlyprescribestheshapeofitsname.Whenworkingwiththisscheme,weobservedthefollowingbenefits:
–Aconceptwillbebetterunderstoodrightatintroductiontime.–Itfacilitateshandlingandoverlookingalargesetofconcepts.
–Somerelationshipsbetweenconceptsareimplicitlyrecordedaswell,whichmuchsimplifiesaxialcodingandtheformingofcategories.
–Aconceptnameexplicitlyrepresentsseveralaspectsatonce,whichsimplifiesthebasicGTpracticeof“constantcomparison”.
–Itbecomeseasiertounderstandwheredifficultiesindelineatingoneconceptagainstanothercomefromandcorrespondinglyeasiertoobtaininsightsastotheweak-nessesoftheoverallcurrentconceptualdescription.
Inourcase,theconceptsneededtodescribeindividualactivitiesbyoneorbothofthepairmembers2,soaconceptnameisstructuredlikeacompletesentence:code=.actor=P1|P2|Pdescription=_