Software Effort Estimation …
Thomas Menguy | January 11, 2006Here is an old article I’ve written when I was a student about software metrics/efforts… As I need for my day to day job to have rough estimates of software projects at an early stage, I’ve begun to train myself on it and resurected this old essay.
Concrete Estimation (size,effort,schedule).
Contents
This little overview is mainly derived from :
This book:
McConnell, Steve, Rapid Development,Microsoft Press,1996
and :
Softwaremetrics.com for the function point description.
- Estimation process Overview
- Size Estimation
- Effort Estimation
- Schedule Estimation
- Ballpark Schedule Estimation
Estimation and Software Projects
Most projects overshoot their estimated schedules by anywhere from 25 to 100 percent, but a few organizations have achieved schedule-preduction accuracies to within 10 percent, and 5 percent is not unheard of (Jones,1994).
Without an accurate schedule estimate, there is no foundation for effective planning and no support for rapid developpement.
- Estimation process Overview :
Software estimation is difficult, and what some people try to do with software estimation isn’t even theoretically possible. Upper management, lower management customers, and some developpers don’t seem to understand why estimation is so hard. People who don’t understand software estimation’s inherent difficulties can play an unwanting role in making estimation even harder than it already is.
The basic software estimation story is a process of gradual refinement. Until each feature is understood in detail, you can’t estimate the cost of a program precisely The following table will help you to find a ballpark range of your current estimation:
Phase | Effort and Size | Schedule | ||
---|---|---|---|---|
Optimistic | Pessimistic | Optimistic | Pessimistic | |
Initial Product Concept | 0.25 | 4.0 | 0.60 | 1.60 |
Approved product Concept | 0.5 | 2.0 | 0.80 | 1.25 |
Requirements specification | 0.67 | 1.5 | 0.85 | 1.15 |
Product Design Specification | 0.80 | 1.25 | 0.90 | 1.10 |
Detailed design specification | 0.90 | 1.10 | 0.95 | 1.05 |
To use the factors in the table, simply multiply your “most likely” single-point estimate by the good factor, and you will be able to present your estimate as a range rather than a single point.
The process of creating an accurate developement schedule consists of three steps :
- Estimate the Size of the product (number of lines of code or function points)
- Estimate the Effort (man-months)
- Estimate the Schedule(calendar months)
You can estimate the size of a project in any of several ways:
- Use an algorithmic approach such as function points, that estimates program size from program features.
- Use size estimation software.
- If you have already worked on a similar project and know it’s size, estimate each major piece of the new system as a percentage of the size of a similar piece of the old system. Estimate the total size of the new system by adding up the estimated sizes of each of the pieces.
We will see an other approach than line-code sizing:
Fonction-Point Estimation:
Function Point Analysis was developed first by Allan J. Albrecht in the mid 1970s. It was an attempt to overcome difficulties associated with lines of code as a measure of software size, and to assist in developing a mechanism to predict effort associated with software development. The method was first published in 1979, then later in 1983 . In 1984 Albrecht refined the method and since 1986, when the International Function Point User Group (IFPUG) was set up, several versions of the Function Point Counting Practices Manual have been published by IFPUG. The current version of the IFPUG Manual is 4.1.
The number of function points in a program is based on the number and complexity of each of the following items:
- External Inputs (EI) – is an elementary process in which data crosses the boundary from outside to inside. This data may come from a data input screen or another application. The data may be used to maintain one or more internal logical files. The data can be either control information or business information. If the data is control information it does not have to update an internal logical file. Screens,forms, boxes, controls, or messages through wich an end-user or other program adds, deletes or changes a program’s data. this includes any input that has a unique format or unique processing logic.
- External Outputs (EO) - an elementary process in which derived data passes across the boundary from inside to outside. Additionally, an EO may update an ILF. The data creates reports or output files sent to other applications. These reports and files are created from one or more internal logical files and external interface file. Screens, reports, graphs, or messages that the program generates for use by an end-user or other program. This includes any output that has a different format or requires a different processing logic than other output types.
- External Inquiry (EQ) - an elementary process with both input and output components that result in data retrieval from one or more internal logical files and external interface files. The input process does not update any Internal Logical Files, and the output side does not contain derived data. Input/output combionations in which an input results in an immediat, simple output. The term originated in the database world and refer to a direct search for specific dat, usually using a single key. In modern GUI applications, the line between inquiries and output is blurry, generally, however, queries retrieve data directly from a database and only provide rudimentary formatting, whereas outputs can process, combine, or summarize complex data and can be highly formatted.
- Internal Logical Files (ILF’s) - a user identifiable group of logically related data that resides entirely within the applications boundary and is maintained through external inputs. Major logical groups of end-user data or control information that are completely controlled by the program. A logical file might consist of a single flat file or a single table in a relational database.
- External Interface Files (EIF’s) - a user identifiable group of logically related data that is used for reference purposes only. The data resides entirely outside the application and is maintained by another application. The external interface file is an internal logical file for another application. Files Controlled by other programs with wich the program being counted interacts. This includes each major logical group of data or control information that enters or leaves the program.
After the components have been classified as one of the five major components (EI’s, EO’s, EQ’s, ILF’s or EIF’s), a ranking of low, average or high is assigned. For transactions (EI’s, EO’s, EQ’s) the ranking is based upon the number of files updated or referenced (FTR’s) and the number of data element types (DET’s). For both ILF’s and EIF’s files the ranking is based upon record element types (RET’s) and data element types (DET’s). A record element type is a user recognizable subgroup of data elements within an ILF or EIF. A data element type is a unique user recognizable, nonrecursive, field. Each of the following tables assists in the ranking process (the numerical rating is in parentheses). For example, an EI that references or updates 2 File Types Referenced (FTR’s) and has 7 data elements would be assigned a ranking of average and associated rating of 4. Where FTR’s are the combined number of Internal Logical Files (ILF’s) referenced or updated and External Interface Files referenced.
FTR’s | Data Elements | ||
---|---|---|---|
1-4 | 5-15 | >15 | |
0-1 | Low | Low | Average |
2 | Low | Average | High |
>2 | Average | High | High |
FTR’s | Data Elements | ||
---|---|---|---|
1-5 | 6-19 | >19 | |
0-1 | Low | Low | Average |
2-3 | Low | Average | High |
>3 | Average | High | High |
Like all components, EQ’s are rated and scored. Basically, an EQ is rated (Low, Average or High) like an EO, but assigned a value like and EI. The rating is based upon the total number of unique (combined unique input and out sides) data elements (DET’s) and the file types referenced (FTR’s) (combined unique input and output sides). If the same FTR is used on both the input and output side, then it is counted only one time. If the same DET is used on both the input and output side, then it is only counted one time.
For both ILF’s and EIF’s the number of record element types and the number of data elements types are used to determine a ranking of low, average or high. A Record Element Type is a user recognizable subgroup of data elements within an ILF or EIF. A Data Element Type (DET) is a unique user recognizable, nonrecursive field on an ILF or EIF.
RET’s | Data Elements | ||
---|---|---|---|
1-19 | 20-50 | >50 | |
1 | Low | Low | Average |
2-5 | Low | Average | High |
>5 | Average | High | High |
The counts for each level of complexity for each type of component can be entered into a table such as the following one. Each count is multiplied by the numerical rating shown to determine the rated value. The rated values on each row are summed across the table, giving a total value for each type of component. These totals are then summed across the table, giving a total value for each type of component. These totals are then summoned down to arrive at the Total Number of Unadjusted Function Points.
Program Characteristic | Function Points | |||
---|---|---|---|---|
Low Complexity | Average Complexity | High Complexity | Total | |
Number of External Inputs | __*3= | __*4= | __*6= | __ |
Number of External Outupts | __*4= | __*5= | __*7= | +__ |
Number of External Inquiries | __*3= | __*4= | __*6= | +__ |
Number of Internal Logical Files | __*7= | __*10= | __*15= | +__ |
Number of External Internal Files | __*5= | __*7= | __*10= | +__ |
Total Number of Unadjusted Function Points = | __ | |||
Multiplied Value Adjustement Factor (VAF) * | __ | |||
Total Adjusted Function Points = | __ |
The value adjustment factor (VAF) is based on 14 general system characteristics (GSC’s) that rate the general functionality of the application being counted.(VAF range : from 0.65 to 1.35)
The degrees of influence range on a scale of 0 to 5, from no influence to strong influence.
The table below is intended to provide an overview of each GSC.
General System Characteristic |
Brief Description |
|
---|---|---|
1. | Data communications | How many communication facilities are there to aid in the transfer or exchange of information with the application or system? |
2. | Distributed data processing | How are distributed data and processing functions handled? |
3. | Performance | Was response time or throughput required by the user? |
4. | Heavily used configuration | How heavily used is the current hardware platform where the application will be executed? |
5. | Transaction rate | How frequently are transactions executed daily, weekly, monthly, etc.? |
6. | On-Line data entry | What percentage of the information is entered On-Line? |
7. | End-user efficiency | Was the application designed for end-user efficiency? |
8. | On-Line update | How many ILF’s are updated by On-Line transaction? |
9. | Complex processing | Does the application have extensive logical or mathematical processing? |
10. | Reusability | Was the application developed to meet one or many user’s needs? |
11. | Installation ease | How difficult is conversion and installation? |
12. | Operational ease | How effective and/or automated are start-up, back-up, and recovery procedures? |
13. | Multiple sites | Was the application specifically designed, developed, and supported to be installed at multiple sites for multiple organizations? |
14. | Facilitate change | Was the application specifically designed, developed, and supported to facilitate change? |
Once all the 14 GSC’s have been answered, they should be tabulated using the IFPUG Value Adjustment Equation (VAF)
where: Ci = degree of influence for each General System Characteristic, assigned by you, from 0 to 5 from no influence to strong influence
Sc= sum of Ci (Sc=C1+C2+…+C14)
The final Function Point Count is obtained by multiplying the VAF times the Unadjusted Function Point (UAF).
Now you can compute costs, effort schedule, on a previous project basis. Or use Jones’s First-Order Estimation Practice to find a rough schedule.
The function point do not rely on any technology, there is a relationship with the number of lines of code, that you can find thanks too your team or company data.Estimation tips:
- avoid off-the-cuff estimates or simple guess: take the time to estimate, never answer without having quietly worked on the estimation.
- Allow time for the estimate and plan it.
- Use data from previous project.
- Use developer-based estimates: use estimation by developers who will do the work
- Estimate by walk-through: have each team member estimate pieces of the project individually.Work until you reach consensus on the high and low ends of the estimation ranges.
- Estimate by categories: easy, medium, hard…
- Estimate at a low level details: a 10% error on a big piece is 10% high or 10%low. 10% error on 50 small pieces tends to eliminate each other.
- Don’t Ommit Common task like: cutover, data-conversion,installation,customization,management of the beta test program, demonstrating the program to customers or users,attendance at change-control meeting, maintenance work on existing systems during the project, defect corrections, administration related to defect tracking,coordination with QA, support for user documentation, review of technical documents, integration, vacations, holidays,sick days,company and departement meetings and training.
- Use several different estimation techniques and compare the results.
- Change estimation practices as the project progresses.
- Don’t forget risk management in your estimate.
You’ll need an effort estimate (man-mounths) in order to know how many people to put on your project; and having an effort estimate makes it easy to derive the schedule estimate.
- Use estimation software
- Use the schedule tables in Ballpark Schedule Estimation
- Use you organization historical data.
- Use an algorithmic approach such as COCOMO (Boehms 1981) or Putman and Myers’s lifecycle model (Putman and Myers 1992) to convert a lines of code estimate into an effort estimate.
You can compute the schedule from the effort estimate by using :
you can use the tables that will follow in Ballpark Schedule Estimation to compute a schedule and effort from a size in lines of code, or:
Jone’s First-Order Estimation Practice
Once you have the function-point count, raise it to the appropriate power selected in the table below. The exponents in the table are derived from Jones’s analysis of his database of several thousand projects.
Kind of Software | Best in Class | Average | Worst In Class |
---|---|---|---|
Systems | 0.43 | 0.45 | 0.48 |
Business | 0.41 | 0.43 | 0.46 |
Shrink-wrap | 0.39 | 0.42 | 0.45 |
This practice isn’t a substitue for more careful schedule estimation, but it does provide a simple means of getting a rough schedule that’s better than guessing.
The following tables describes 3 kinds of projects:
- Systems Software:includes OS software,device drivers,compilers, code libraries.
- Business Software:in-house systems that are used by a single organization. They run on a limited set of hardware, perhaps only a single computer.Payroll systems, accounting systems, inventory control system, as well as (there) IS,IT and MIS software are in that category.
- Shrink-wrap Software:software that is packaged and sold commercially.(word processors,spreadsheet, but also financial analysis software, screenplay-writing and legal case management programs)
Systems software does not include Embedded software,firmware,real-time sytems,scientific sofware and the like. Productivity for this kinds of systems would be much lower. For you particular project, you can mix the models , for example 40% Business, 60% shrink-wrap, and recompute the schedule and effort obtained with the following tables with these proportions.
Before using these tables, you may want to reduce the schedule, here is how to recompute effort (possible if you use nominal project table..):
If you have an initial schedule of 12 months and an initial effort of 78 man months, and you want a 10 months schedule: that yield a compressed schedule effort of 94 man months which means that the 17 percent reduction in the schedule requires a 21 percent increase in effort
Most researchers have concluded that it isn’t possible to achieve a schedule compression factor lower than about 0.75-0.80 (Boehm 1981; Putnam and Myers 1992, Jones 1994).
Systems Products | Business Products | Shrink-Wrap products | ||||
---|---|---|---|---|---|---|
System Size (lines of code) | Schedule (calendar months) | Effort (man-months) | Schedule (calendar months) | Effort (man-months) | Schedule (calendar months) | Effort (man-months) |
10,000 | 6 | 25 | 3.5 | 5 | 4.2 | 8 |
15,000 | 7 | 40 | 4.1 | 8 | 4.9 | 13 |
20,000 | 8 | 57 | 4.6 | 11 | 5.6 | 19 |
25,000 | 9 | 74 | 5.1 | 15 | 6 | 24 |
30,000 | 9 | 110 | 5.5 | 22 | 7 | 37 |
35,000 | 10 | 130 | 5.8 | 26 | 7 | 44 |
40,000 | 11 | 170 | 6 | 34 | 7 | 57 |
45,000 | 11 | 195 | 6 | 39 | 8 | 66 |
50,000 | 11 | 230 | 7 | 46 | 8 | 79 |
60,000 | 12 | 285 | 7 | 57 | 9 | 98 |
70,000 | 13 | 350 | 8 | 71 | 9 | 120 |
80,000 | 14 | 410 | 8 | 83 | 10 | 140 |
90,000 | 14 | 480 | 9 | 96 | 10 | 170 |
100,000 | 15 | 540 | 9 | 110 | 11 | 190 |
120,000 | 16 | 680 | 10 | 140 | 11 | 240 |
140,000 | 17 | 820 | 10 | 160 | 12 | 280 |
160,000 | 18 | 960 | 10 | 190 | 13 | 335 |
180,000 | 19 | 1,100 | 11 | 220 | 13 | 390 |
200,000 | 20 | 1,250 | 11 | 250 | 14 | 440 |
250,000 | 22 | 1,650 | 13 | 330 | 15 | 580 |
300,000 | 24 | 2,100 | 14 | 420 | 16 | 725 |
400,000 | 27 | 2,900 | 15 | 590 | 19 | 1,000 |
500,000 | 30 | 3,900 | 17 | 780 | 20 | 1,400 |
Systems Products | Business Products | Shrink-Wrap products | ||||
---|---|---|---|---|---|---|
System Size (lines of code) | Schedule (calendar months) | Effort (man-months) | Schedule (calendar months) | Effort (man-months) | Schedule (calendar months) | Effort (man-months) |
10,000 | 8 | 24 | 4.9 | 5 | 5.9 | 8 |
15,000 | 10 | 38 | 5.8 | 8 | 7 | 12 |
20,000 | 11 | 54 | 7 | 11 | 8 | 18 |
25,000 | 12 | 70 | 7 | 14 | 9 | 23 |
30,000 | 13 | 97 | 8 | 20 | 9 | 32 |
35,000 | 14 | 120 | 8 | 24 | 10 | 39 |
40,000 | 15 | 140 | 9 | 30 | 10 | 49 |
45,000 | 16 | 170 | 9 | 34 | 11 | 57 |
50,000 | 16 | 190 | 10 | 40 | 11 | 67 |
60,000 | 18 | 240 | 10 | 49 | 12 | 83 |
70,000 | 19 | 290 | 11 | 61 | 13 | 100 |
80,000 | 20 | 345 | 12 | 71 | 14 | 120 |
90,000 | 21 | 400 | 12 | 82 | 15 | 140 |
100,000 | 22 | 450 | 13 | 93 | 15 | 160 |
120,000 | 23 | 560 | 14 | 115 | 16 | 195 |
140,000 | 25 | 670 | 15 | 140 | 17 | 235 |
160,000 | 26 | 709 | 15 | 160 | 18 | 280 |
180,000 | 28 | 910 | 16 | 190 | 19 | 320 |
200,000 | 29 | 1,300 | 17 | 210 | 20 | 360 |
250,000 | 32 | 1,300 | 19 | 280 | 22 | 470 |
300,000 | 34 | 1,650 | 20 | 345 | 24 | 590 |
400,000 | 38 | 2,350 | 22 | 490 | 27 | 830 |
500,000 | 42 | 3,100 | 25 | 640 | 29 | 1,100 |
Systems Products | Business Products | Shrink-Wrap products | ||||
---|---|---|---|---|---|---|
System Size (lines of code) | Schedule (calendar months) | Effort (man-months) | Schedule (calendar months) | Effort (man-months) | Schedule (calendar months) | Effort (man-months) |
10,000 | 10 | 48 | 6 | 9 | 7 | 15 |
15,000 | 12 | 76 | 7 | 15 | 8 | 24 |
20,000 | 14 | 110 | 8 | 21 | 9 | 34 |
25,000 | 15 | 140 | 9 | 27 | 10 | 44 |
30,000 | 16 | 185 | 9 | 37 | 11 | 59 |
35,000 | 17 | 220 | 10 | 44 | 12 | 71 |
40,000 | 18 | 270 | 10 | 54 | 13 | 88 |
45,000 | 19 | 310 | 11 | 61 | 13 | 100 |
50,000 | 20 | 360 | 11 | 71 | 14 | 115 |
60,000 | 21 | 440 | 12 | 88 | 15 | 145 |
70,000 | 23 | 540 | 13 | 105 | 16 | 175 |
80,000 | 24 | 630 | 14 | 125 | 17 | 210 |
90,000 | 25 | 730 | 15 | 140 | 17 | 240 |
100,000 | 26 | 820 | 15 | 160 | 18 | 270 |
120,000 | 28 | 1,000 | 16 | 200 | 20 | 335 |
140,000 | 30 | 1,200 | 17 | 240 | 21 | 400 |
160,000 | 32 | 1,400 | 18 | 280 | 22 | 470 |
180,000 | 34 | 1,600 | 19 | 330 | 23 | 240 |
200,000 | 35 | 1,900 | 20 | 370 | 24 | 610 |
250,000 | 38 | 2,400 | 22 | 480 | 26 | 800 |
300,000 | 41 | 3,000 | 24 | 600 | 29 | 1,000 |
400,000 | 47 | 4,200 | 27 | 840 | 32 | 1,400 |
500,000 | 51 | 5,500 | 29 | 1,100 | 35 | 1,800 |
Bibliography
McConnell, Steve, Rapid Development,Microsoft Press,1996 Presents all the factor to achieve rapid development, from risk evaluation, good practices, classical mistake, etc … to team psychology or negociating. Really great. Boehm, Barry W., Software Engineering Economics,Englewood Cliffs N.J.: Prentice Hall 1981 COCOMO cost-estimation model, by its creator. DeMarco, Tom, Controlling Software Projects, New York: Yourdon Press,1982. Describes several estimation models. Putnam,Lawrence H., and Ware Myers. Measures of Excellence: Reliable Software on Time, Within Budget. Englewood Cliffs N.J.:Yourdon Press,1992. Presents a full-fleged software-project estimation.Explains how to calibrate a simple cost-estimation model to your organisation and how to use it to estimate medium to large projects. Jones,Capers. Assessment and Control of Software Risks. Englewood Cliffs N.J.:Yourdon Press,1994. Estimation,Project management. Gilb,Tom. Principles of Software Engineering Management. Workingham,Englang: Addison Wesley,1988. Practical advices for estimating software schedule.Focus on the importance of controlling the project to achieve your objectives rather than passive prediction about it. Dreger,Brian. Function Point Analysis,Englewood Cliffs N.J.: Prentice Hall 1989 Function Point Analysis. Jones,Capers. Applied Software Measurement:Assuring Productivity and Quality, New York:McGraw-Hill,1991. Function Point Analysis.
Links
class=liens>COCOMO II: from the roots…where it is developped. class=liens>IFPUG: Official Function Points web page . class=liens>Softwaremetrics: A good Function Points introduction. class=liens>A Function Points FAQ: Not IFPUG related .
Thomas Menguy, 2001. ISIA Student.