The probability distribution of project completion times in simulation-based scheduling

  • Published on

  • View

  • Download


  • KSCE Journal of Civil Engineering (2013) 17(4):638-645DOI 10.1007/s12205-013-0147-x


    Construction Management

    The Probability Distribution of Project Completion Times in Simulation-based Scheduling

    Dong-Eun Lee*, David Arditi**, and Chang-Baek Son***

    Received March 22, 2012/Accepted July 9, 2012


    The assumption of the normality of the distribution of Project Completion Times (PCTs) in simulation-based scheduling has beengenerally accepted as the norm. However, it is well established in the literature that PCTs are not always normally distributed and thattheir distribution and variability are affected by the distribution and variability of activity durations. This paper presents an automatedrisk quantification method that determines the best fit Probability Distribution Function (PDF) of PCTs. The algorithm isprogrammed in MATLAB and generates a set of simulation outputs obtained by systematically changing the probability distributionfunctions that define activities durations in a network and analyzes the effect of different distributions of activity durations on thedistribution of the PCTs. The procedure is described and the findings are presented. This easy-to-use computerized tool improves thereliability of simulation-based scheduling by calculating the exact PDFs of activity durations, simulating the network, and calculatingthe exact PDF of PCTs. It also simplifies the tedious process involved in finding the PDFs of the many activity durations, and is awelcome replacement for the normality assumptions used by most simulation-based scheduling researchers.Keywords: stochastic networks, simulation-based scheduling, project risk analysis

    1. Introduction

    Simulation-based scheduling enhances the value of traditionalscheduling methods by relaxing some of the restrictive assumptionsof PERT. It enhances reliability by describing the ProjectCompletion Time (PCT) as a probability distribution. But attentionhas not been paid to the normality assumption that is built inthese scheduling methods, neither to the opportunity to improvethe reliability of these scheduling methods by finding the best-fit-PDFs of the many activities in a schedule and an exact PDF ofPCTs. Even though the statistical process to identify the best-fit-PDFs is well established (Ang and Tang, 1975), it has not beeneffectively used in simulation-based scheduling.

    PERT requires three time estimates (optimistic, most likely,and pessimistic times) for each activity. The estimates determinethe probability distributions of the activity durations and eventuallythe completion time of the entire project. PERT assumes thatactivity duration is a random variable that can be derived byusing a simple formula. The expected mean activity durationshence calculated are used to generate PCTs and their variance.Given that the PCT is the sum of the durations of the activitieslocated on the critical path(s), researchers argue that it can beapproximated with a normal distribution because it is known that

    the sum of a large number of independent and identically distributedrandom variables will approximate a normally distributedrandom variable. Therefore PERT assumes that PCTs constitutenormally distributed random variables even though thisassumption has been challenged by many researchers since theseminal study by MacCrimmon and Ryavec (1962).

    The large majority of simulation-based scheduling methodsassume implicitly that PCTs follow a normal distribution too. Forexample, Ang and Tang (1975) assumed that the PDF of PCTs isthe same as that of activity durations, which is a Gaussianrandom variable. Halpin and Riggs (1992) proposed a CYCLONE-CPM simulation approach to obtain the average projectcompletion time using the CYCLONE system (1990). Lu andAbouRizk (2000) calculated the probability of completing aproject by a specified duration using a theoretical normaldistribution in their simplified CPM/PERT simulation model.The assumption of normality in calculating the PCT has beenaccepted and used without any question and without examiningits authenticity. But if this assumption is not correct, it may leadto errors in the results. That is why, the best fit PDF of PCTsneeds to be determined empirically, but automatically to handle alarge network that is frequently encountered in practice.

    It is noteworthy that durations of all activities are not always

    *Member, Associate Professor, School of Architecture & Civil Engineering, KyungPook National University, Daegu 702-701, Korea (CorrespondingAuthor, E-mail:

    **Professor, Dept. of Civil and Architectural Engineering, Illinois Inst. of Tech., Chicago, IL 60616, USA (E-mail:***Member, Professor, Dept. of Architectural Engineering, Semyung University, Chungbuk 390-711, Korea (E-mail:

  • The Probability Distribution of Project Completion Times in Simulation-based Scheduling

    Vol. 17, No. 4 / May 2013 639

    identically distributed, nor necessarily independent from eachother. Sometimes the durations of different activities are modeledusing different Probability Distribution Functions (PDFs) havingunique parameters. Some activity durations may have very longtails, and may be skewed to the left or to the right. In such cases,it would be incorrect to assume that PCTs are normally distributed.This paper presents an automated method that generates the bestfit PDF that characterizes the PCTs obtained from networksimulations, given that historical activity durations are assignedspecific PDFs. The algorithm is implemented by using MATLAB(Chapra and Canale, 2002; Schilling and Harris, 2000). The bestfit PDF of PCTs can be predicted more accurately and with moreconfidence if one uses this method, which integrates theautomated tool that finds the best fit PDF into stochasticsimulation based scheduling.

    2. PDFs Used in Simulation-based Scheduling

    A detailed review of simulation-based scheduling methodswas conducted by Adlakha and Kulkarni (1989). Much of theresearch deals with project completion times (Ahuja andNandakumar, 1985; Barraza et al., 2004; Dodin and Sirvanci,1990; Lee, 2005; Lee and Arditi, 2006; Sculli, 1983) whereasonly few researchers ever attempted to determine the probabilitydistribution of PCTs (Barraza et al., 2004; Cottrell, 1999; Lu andAbouRizk, 2000). The research studies in this field can becategorized into: (1) exact methods, (2) approximation methods,and (3) simulation methods.

    The exact methods (Dodin, 1985; Fisher et al., 1985; Hagstrom,1990; Kulkarni and Adlakha, 1986) use a direct approach, butmake some restrictive assumptions resulting in limitations. Forexample, Hagstrom (1990) assumes that the probability distributionof the activity duration is a discrete distribution.

    The approximation methods (Dodin, 1985; Dodin and Sirvanci,1990; Golenko-Ginzburg, 1989; Sculli, 1989; Sculli and Wong,1985) determine the distribution of PCTs by using an indirectapproach. For example, Sculli (1983) proposes a method tocompute the mean and variance of the PCT approximately.Cottrell (1999) proposes a simplified PERT by reducing thenumber of estimates of activity durations from three to two (i.e.,most likely and pessimistic times). Finally, Kamburowski (1985)attempts to use the normal distribution rather than the PERT-betato model activity duration.

    The simulation methods (Barraza et al., 2004; Lee, 2005; Leeand Arditi, 2006; Lee and Shi, 2004; Lu and AbouRizk, 2000;Sculli, 1983) obtain the desired PCT statistics with activitydurations that have specific PDFs. After running tests, Sculli(1983) concludes that simulation is more accurate and moreeconomical than PERT. In addition, Sculli (1989) proposesvariance reduction techniques using a multivariate normaldistribution to model PCTs. Barraza et al. (2004) presentstochastic S-curves, which provide probability distributions fortime and cost at every intermediate point and at completion.After Lee (2008) identified the non-normality issues in simulation-

    based scheduling, Kim et al. (2009) proposed a statistical processthat effectively identifies the best-fit-PDFs.

    The reliability of network calculations depends on theprobability distribution of PCTs. The distribution of PCTs mayvary from a normal to an asymmetric distribution depending onfactors such as the size and configuration of the network, thedependence between paths, the probability distributions assignedto activity durations, and the number of competing and/ordominating paths.

    Thirty years of experiences have been accumulated about theperformance of simulation-based scheduling methods. But theassumption that the simulation output is normally distributed hasbeen used over and over again because obtaining the mean andstandard deviation of a normal distribution is easier (Ang andTang, 1975; Halpin and Riggs, 1992; Lu and AbouRizk, 1990).After Perry and Greig (1975) found that a beta distribution isappropriate when the distribution of a data set is not known,identifying the parameters of this PDF has been the subject ofseveral researchers. For example, AbouRizk et al. (1991) presenteda procedure that estimates the beta parameters ( and ).AbouRizk and Halpin (1992) demonstrated that most earthmovingconstruction operations can be described by a beta PDF. Farid andKoning (1994), Maio and Schexnayder (2000), Fente et al.(2002), and Schexnayder et al. (2005) determined the parametersof a beta PDF and confirmed that the beta distribution is a closefit for modeling construction task time distributions. Thesemethods are particularly useful when not enough data areavailable or when subjective estimates are involved. In this respect,efforts to advance simulation-based scheduling have remainedstagnant. It should be noted that the normality assumption of thePCT may misrepresent the variation in the data obtained fromsimulation and may limit the project schedulers ability to drawinferences and make informed predictions and/or decisions.

    3. Methodology

    The method proposed in the study is described in the flowchartpresented in Fig. 1. The algorithm runs many simulation experimentsusing activity durations modeled with a specific PDF anddifferent PDFs for different activities. Several sets of PCTs aregenerated by the algorithm presented in Fig. 1.

    Step : A network schedule is modeled using PrimaveraProject Planner (P3) . The schedule data exported from P3are read by the system.

    Step : The schedule data read in step are convertedinto an appropriate data structure for simulation runs. Theconverted schedule data are saved in an Excel spreadsheetfile.

    Step : The deterministic activity duration data are read bythe system.

    Step : In CPM mode, a set of predefined deterministicdurations are assigned to the activities.

    Step : Deterministic CPM calculations are performed.The PCT is saved in the computers memory.

  • Dong-Eun Lee, David Arditi, and Chang-Baek Son

    640 KSCE Journal of Civil Engineering

    Step : In PERT mode, activities most likely times arerepresented by predefined deterministic durations. The mostlikely times are then used to compute optimistic and pessi-mistic times based on the assumption that the Coefficient ofVariation (COV) is 20% as in Ang and Tang (1975). It isnoteworthy that COV is adjustable for a users preference.The three time estimates (i.e., optimistic, most-likely, andpessimistic times) hence generated are used in the probabi-listic PERT mode.

    Step : Probabilistic PERT calculations are performed.The mean and variance of the PCTs are computed and savedin the computer's memory.

    Step : In Simulation mode, the durations of the activitiesare generated by simulation. The deterministic activity dura-tions read in step are used as the mean values of the expo-nential distribution as described later in Case I. Since thecoefficient of variation is taken as 20%, the system automat-ically generates a user defined number of random variatesfor each activity by using the activitys probability distribu-tion. The random variates of activity durations are saved inan Excel spreadsheet file.

    Step : The Maximum Likelihood Estimates (MLEs) ofthe parameters of the PDFs (i.e., normal, lognormal, beta,uniform, exponential, gamma, generalized extreme value,extreme value, and Weibull, etc) are computed by using theactivity durations obtained in step .

    Step : A PDF is selected with respect to the MLE infor-mation calculated in step . It is also possible to specify a

    PDF depending on the users preference. Seven PDFs (nor-mal, lognormal, beta, uniform, triangular, exponential, andWeibull) were specified in this study in addition to the deter-ministic and PERT-Beta values.

    Step : Stochastic schedule simulation (S3) is conducted.The S3 algorithm is described in steps to .

    Step : The number of iterations is set by the user of thesystem. In the experiments conducted in this study, 3,000activity durations were generated for each activity for eachPDF.

    Step : The initial number of iterations is set to zero. Step : Activity durations are generated using a random

    number generator that produces random variates using thePDFs and their MLEs estimated in step . The randomnumber generation functions (i.e., Normrnd for normal,Lognrnd for lognormal, Betarnd for beta, Unifrnd for uni-form, Trirnd for trianmgular, Exprnd for exponential, andWebrnd for Weibull distributions, etc) available in MAT-LAB are used to generate activity durations based on thePDF assigned to an activity. The kernel smoothing functionin MATLAB is used to convert the discrete data into a con-tinuous distribution of PCTs.

    Step : The forward pass algorithm is executed by usingthe random durations generated in step . PCTs areobtained after performing the CPM calculations. The projectcompletion time corresponds to the event time of the endnode.

    Step : After the maximum number of specified simula-

    Fig. 1. Algorithm to Establish the Distribution of PCTs Using Simulation

  • The Probability Distribution of Project Completion Times in Simulation-based Scheduling

    Vol. 17, No. 4 / May 2013 641

    tion iterations is completed, the PCTs obtained in step arecollected, and saved in a vector. When 3,000 simulation iter-ations are completed, 3,000 sets of PCTs are obtained andsaved in respective vectors.

    Step : If the total number of simulation runs is below thepredefined number of iterations, the program performs steps to again. The algorithm moves to the next step assoon as the number of iterations reaches the maximum num-ber of iterations set by the user (3,000 in this study). Thefor repetition structure is used to repeat steps to forthe specified number of iterations.

    Step : After getting a set of 3,000 PCTs, the minimumnumber of simulation runs is calculated (Ang and Tang,1975; Lee and Arditi, 2006).

    Step : The system checks whether the simulation experi-ment passes the maturity test, i.e., whether more than 3,000iterations are necessary. The algorithm compares if the mini-mum number of simulation runs determined i...


View more >