Software Effort Estimation …

Thomas Menguy | January 11, 2006

Here is an old article I’ve written when I was a student about software metrics/efforts… As I need for my day to day job to have rough estimates of software projects at an early stage, I’ve begun to train myself on it and resurected this old essay.

Concrete Estimation (size,effort,schedule).

This little overview is mainly derived from :

This book:
McConnell, Steve, Rapid Development,Microsoft Press,1996
and :
Softwaremetrics.com for the function point description.

Estimation and Software Projects

Most projects overshoot their estimated schedules by anywhere from 25 to 100 percent, but a few organizations have achieved schedule-preduction accuracies to within 10 percent, and 5 percent is not unheard of (Jones,1994).
Without an accurate schedule estimate, there is no foundation for effective planning and no support for rapid developpement.

Estimation process Overview :

Software estimation is difficult, and what some people try to do with software estimation isn’t even theoretically possible. Upper management, lower management customers, and some developpers don’t seem to understand why estimation is so hard. People who don’t understand software estimation’s inherent difficulties can play an unwanting role in making estimation even harder than it already is.
The basic software estimation story is a process of gradual refinement. Until each feature is understood in detail, you can’t estimate the cost of a program precisely The following table will help you to find a ballpark range of your current estimation:

Phase	Effort and Size		Schedule
Phase	Optimistic	Pessimistic	Optimistic	Pessimistic
Initial Product Concept	0.25	4.0	0.60	1.60
Approved product Concept	0.5	2.0	0.80	1.25
Requirements specification	0.67	1.5	0.85	1.15
Product Design Specification	0.80	1.25	0.90	1.10
Detailed design specification	0.90	1.10	0.95	1.05

The process of creating an accurate developement schedule consists of three steps :

Estimate the Size of the product (number of lines of code or function points)
Estimate the Effort (man-months)
Estimate the Schedule(calendar months)

Size Estimation:

You can estimate the size of a project in any of several ways:

Use an algorithmic approach such as function points, that estimates program size from program features.
Use size estimation software.
If you have already worked on a similar project and know it’s size, estimate each major piece of the new system as a percentage of the size of a similar piece of the old system. Estimate the total size of the new system by adding up the estimated sizes of each of the pieces.

We will see an other approach than line-code sizing:
Fonction-Point Estimation:
Function Point Analysis was developed first by Allan J. Albrecht in the mid 1970s. It was an attempt to overcome difficulties associated with lines of code as a measure of software size, and to assist in developing a mechanism to predict effort associated with software development. The method was first published in 1979, then later in 1983 . In 1984 Albrecht refined the method and since 1986, when the International Function Point User Group (IFPUG) was set up, several versions of the Function Point Counting Practices Manual have been published by IFPUG. The current version of the IFPUG Manual is 4.1.
The number of function points in a program is based on the number and complexity of each of the following items:

External Inputs (EI) – is an elementary process in which data crosses the boundary from outside to inside. This data may come from a data input screen or another application. The data may be used to maintain one or more internal logical files. The data can be either control information or business information. If the data is control information it does not have to update an internal logical file. Screens,forms, boxes, controls, or messages through wich an end-user or other program adds, deletes or changes a program’s data. this includes any input that has a unique format or unique processing logic.
External Outputs (EO) - an elementary process in which derived data passes across the boundary from inside to outside. Additionally, an EO may update an ILF. The data creates reports or output files sent to other applications. These reports and files are created from one or more internal logical files and external interface file. Screens, reports, graphs, or messages that the program generates for use by an end-user or other program. This includes any output that has a different format or requires a different processing logic than other output types.
External Inquiry (EQ) - an elementary process with both input and output components that result in data retrieval from one or more internal logical files and external interface files. The input process does not update any Internal Logical Files, and the output side does not contain derived data. Input/output combionations in which an input results in an immediat, simple output. The term originated in the database world and refer to a direct search for specific dat, usually using a single key. In modern GUI applications, the line between inquiries and output is blurry, generally, however, queries retrieve data directly from a database and only provide rudimentary formatting, whereas outputs can process, combine, or summarize complex data and can be highly formatted.
Internal Logical Files (ILF’s) - a user identifiable group of logically related data that resides entirely within the applications boundary and is maintained through external inputs. Major logical groups of end-user data or control information that are completely controlled by the program. A logical file might consist of a single flat file or a single table in a relational database.
External Interface Files (EIF’s) - a user identifiable group of logically related data that is used for reference purposes only. The data resides entirely outside the application and is maintained by another application. The external interface file is an internal logical file for another application. Files Controlled by other programs with wich the program being counted interacts. This includes each major logical group of data or control information that enters or leaves the program.

After the components have been classified as one of the five major components (EI’s, EO’s, EQ’s, ILF’s or EIF’s), a ranking of low, average or high is assigned. For transactions (EI’s, EO’s, EQ’s) the ranking is based upon the number of files updated or referenced (FTR’s) and the number of data element types (DET’s). For both ILF’s and EIF’s files the ranking is based upon record element types (RET’s) and data element types (DET’s). A record element type is a user recognizable subgroup of data elements within an ILF or EIF. A data element type is a unique user recognizable, nonrecursive, field. Each of the following tables assists in the ranking process (the numerical rating is in parentheses). For example, an EI that references or updates 2 File Types Referenced (FTR’s) and has 7 data elements would be assigned a ranking of average and associated rating of 4. Where FTR’s are the combined number of Internal Logical Files (ILF’s) referenced or updated and External Interface Files referenced.

FTR’s	Data Elements
FTR’s	1-4	5-15	>15
0-1	Low	Low	Average
2	Low	Average	High
>2	Average	High	High

FTR’s	Data Elements
FTR’s	1-5	6-19	>19
0-1	Low	Low	Average
2-3	Low	Average	High
>3	Average	High	High

Like all components, EQ’s are rated and scored. Basically, an EQ is rated (Low, Average or High) like an EO, but assigned a value like and EI. The rating is based upon the total number of unique (combined unique input and out sides) data elements (DET’s) and the file types referenced (FTR’s) (combined unique input and output sides). If the same FTR is used on both the input and output side, then it is counted only one time. If the same DET is used on both the input and output side, then it is only counted one time.
For both ILF’s and EIF’s the number of record element types and the number of data elements types are used to determine a ranking of low, average or high. A Record Element Type is a user recognizable subgroup of data elements within an ILF or EIF. A Data Element Type (DET) is a unique user recognizable, nonrecursive field on an ILF or EIF.

RET’s	Data Elements
RET’s	1-19	20-50	>50
1	Low	Low	Average
2-5	Low	Average	High
>5	Average	High	High

The counts for each level of complexity for each type of component can be entered into a table such as the following one. Each count is multiplied by the numerical rating shown to determine the rated value. The rated values on each row are summed across the table, giving a total value for each type of component. These totals are then summed across the table, giving a total value for each type of component. These totals are then summoned down to arrive at the Total Number of Unadjusted Function Points.

Program Characteristic	Function Points
Program Characteristic	Low Complexity	Average Complexity	High Complexity	Total
Number of External Inputs	__*3=	__*4=	__*6=	__
Number of External Outupts	__*4=	__*5=	__*7=	+__
Number of External Inquiries	__*3=	__*4=	__*6=	+__
Number of Internal Logical Files	__*7=	__*10=	__*15=	+__
Number of External Internal Files	__*5=	__*7=	__*10=	+__
			Total Number of Unadjusted Function Points =	__
			Multiplied Value Adjustement Factor (VAF) *	__
			Total Adjusted Function Points =	__

The value adjustment factor (VAF) is based on 14 general system characteristics (GSC’s) that rate the general functionality of the application being counted.(VAF range : from 0.65 to 1.35)
The degrees of influence range on a scale of 0 to 5, from no influence to strong influence.
The table below is intended to provide an overview of each GSC.

General System Characteristic		Brief Description
1.	Data communications	How many communication facilities are there to aid in the transfer or exchange of information with the application or system?
2.	Distributed data processing	How are distributed data and processing functions handled?
3.	Performance	Was response time or throughput required by the user?
4.	Heavily used configuration	How heavily used is the current hardware platform where the application will be executed?
5.	Transaction rate	How frequently are transactions executed daily, weekly, monthly, etc.?
6.	On-Line data entry	What percentage of the information is entered On-Line?
7.	End-user efficiency	Was the application designed for end-user efficiency?
8.	On-Line update	How many ILF’s are updated by On-Line transaction?
9.	Complex processing	Does the application have extensive logical or mathematical processing?
10.	Reusability	Was the application developed to meet one or many user’s needs?
11.	Installation ease	How difficult is conversion and installation?
12.	Operational ease	How effective and/or automated are start-up, back-up, and recovery procedures?
13.	Multiple sites	Was the application specifically designed, developed, and supported to be installed at multiple sites for multiple organizations?
14.	Facilitate change	Was the application specifically designed, developed, and supported to facilitate change?

Once all the 14 GSC’s have been answered, they should be tabulated using the IFPUG Value Adjustment Equation (VAF)
where: Ci = degree of influence for each General System Characteristic, assigned by you, from 0 to 5 from no influence to strong influence
Sc= sum of Ci (Sc=C1+C2+…+C14)
VAF = 0.65 + Sc/100
The final Function Point Count is obtained by multiplying the VAF times the Unadjusted Function Point (UAF).
FP = UAF * VAF
Now you can compute costs, effort schedule, on a previous project basis. Or use Jones’s First-Order Estimation Practice to find a rough schedule.
The function point do not rely on any technology, there is a relationship with the number of lines of code, that you can find thanks too your team or company data.Estimation tips:

avoid off-the-cuff estimates or simple guess: take the time to estimate, never answer without having quietly worked on the estimation.
Allow time for the estimate and plan it.
Use data from previous project.
Use developer-based estimates: use estimation by developers who will do the work
Estimate by walk-through: have each team member estimate pieces of the project individually.Work until you reach consensus on the high and low ends of the estimation ranges.
Estimate by categories: easy, medium, hard…
Estimate at a low level details: a 10% error on a big piece is 10% high or 10%low. 10% error on 50 small pieces tends to eliminate each other.
Don’t Ommit Common task like: cutover, data-conversion,installation,customization,management of the beta test program, demonstrating the program to customers or users,attendance at change-control meeting, maintenance work on existing systems during the project, defect corrections, administration related to defect tracking,coordination with QA, support for user documentation, review of technical documents, integration, vacations, holidays,sick days,company and departement meetings and training.
Use several different estimation techniques and compare the results.
Change estimation practices as the project progresses.
Don’t forget risk management in your estimate.

Effort Estimation:

You’ll need an effort estimate (man-mounths) in order to know how many people to put on your project; and having an effort estimate makes it easy to derive the schedule estimate.

Use estimation software
Use the schedule tables in Ballpark Schedule Estimation
Use you organization historical data.
Use an algorithmic approach such as COCOMO (Boehms 1981) or Putman and Myers’s lifecycle model (Putman and Myers 1992) to convert a lines of code estimate into an effort estimate.

Schedule Estimation:

You can compute the schedule from the effort estimate by using :

schedule in months = 3.0 * man-months^(1/3) ( or : sch=3.0*effort^(1/3))
you can use the tables that will follow in Ballpark Schedule Estimation to compute a schedule and effort from a size in lines of code, or:
Jone’s First-Order Estimation Practice
Once you have the function-point count, raise it to the appropriate power selected in the table below. The exponents in the table are derived from Jones’s analysis of his database of several thousand projects.

Kind of Software	Best in Class	Average	Worst In Class
Systems	0.43	0.45	0.48
Business	0.41	0.43	0.46
Shrink-wrap	0.39	0.42	0.45

for example: you have a 350 functions point shrink-wrap project, your team technical level is average, you would raise 350 to the 0.42 power (350^0.42), for a rough schedule of 12 calendar months.
This practice isn’t a substitue for more careful schedule estimation, but it does provide a simple means of getting a rough schedule that’s better than guessing.

Ballpark Schedule Estimation:

The following tables describes 3 kinds of projects:

Systems Software:includes OS software,device drivers,compilers, code libraries.
Business Software:in-house systems that are used by a single organization. They run on a limited set of hardware, perhaps only a single computer.Payroll systems, accounting systems, inventory control system, as well as (there) IS,IT and MIS software are in that category.
Shrink-wrap Software:software that is packaged and sold commercially.(word processors,spreadsheet, but also financial analysis software, screenplay-writing and legal case management programs)

Systems software does not include Embedded software,firmware,real-time sytems,scientific sofware and the like. Productivity for this kinds of systems would be much lower. For you particular project, you can mix the models , for example 40% Business, 60% shrink-wrap, and recompute the schedule and effort obtained with the following tables with these proportions.

Before using these tables, you may want to reduce the schedule, here is how to recompute effort (possible if you use nominal project table..):

Schedule Compression factor= desired schedule/initial schedule

compressed schedule effort = initial effort/Schedule Compression factor
If you have an initial schedule of 12 months and an initial effort of 78 man months, and you want a 10 months schedule: that yield a compressed schedule effort of 94 man months which means that the 17 percent reduction in the schedule requires a 21 percent increase in effort
Most researchers have concluded that it isn’t possible to achieve a schedule compression factor lower than about 0.75-0.80 (Boehm 1981; Putnam and Myers 1992, Jones 1994).

	Systems Products		Business Products		Shrink-Wrap products
System Size (lines of code)	Schedule (calendar months)	Effort (man-months)	Schedule (calendar months)	Effort (man-months)	Schedule (calendar months)	Effort (man-months)
10,000	6	25	3.5	5	4.2	8
15,000	7	40	4.1	8	4.9	13
20,000	8	57	4.6	11	5.6	19
25,000	9	74	5.1	15	6	24
30,000	9	110	5.5	22	7	37
35,000	10	130	5.8	26	7	44
40,000	11	170	6	34	7	57
45,000	11	195	6	39	8	66
50,000	11	230	7	46	8	79
60,000	12	285	7	57	9	98
70,000	13	350	8	71	9	120
80,000	14	410	8	83	10	140
90,000	14	480	9	96	10	170
100,000	15	540	9	110	11	190
120,000	16	680	10	140	11	240
140,000	17	820	10	160	12	280
160,000	18	960	10	190	13	335
180,000	19	1,100	11	220	13	390
200,000	20	1,250	11	250	14	440
250,000	22	1,650	13	330	15	580
300,000	24	2,100	14	420	16	725
400,000	27	2,900	15	590	19	1,000
500,000	30	3,900	17	780	20	1,400

	Systems Products		Business Products		Shrink-Wrap products
System Size (lines of code)	Schedule (calendar months)	Effort (man-months)	Schedule (calendar months)	Effort (man-months)	Schedule (calendar months)	Effort (man-months)
10,000	8	24	4.9	5	5.9	8
15,000	10	38	5.8	8	7	12
20,000	11	54	7	11	8	18
25,000	12	70	7	14	9	23
30,000	13	97	8	20	9	32
35,000	14	120	8	24	10	39
40,000	15	140	9	30	10	49
45,000	16	170	9	34	11	57
50,000	16	190	10	40	11	67
60,000	18	240	10	49	12	83
70,000	19	290	11	61	13	100
80,000	20	345	12	71	14	120
90,000	21	400	12	82	15	140
100,000	22	450	13	93	15	160
120,000	23	560	14	115	16	195
140,000	25	670	15	140	17	235
160,000	26	709	15	160	18	280
180,000	28	910	16	190	19	320
200,000	29	1,300	17	210	20	360
250,000	32	1,300	19	280	22	470
300,000	34	1,650	20	345	24	590
400,000	38	2,350	22	490	27	830
500,000	42	3,100	25	640	29	1,100

	Systems Products		Business Products		Shrink-Wrap products
System Size (lines of code)	Schedule (calendar months)	Effort (man-months)	Schedule (calendar months)	Effort (man-months)	Schedule (calendar months)	Effort (man-months)
10,000	10	48	6	9	7	15
15,000	12	76	7	15	8	24
20,000	14	110	8	21	9	34
25,000	15	140	9	27	10	44
30,000	16	185	9	37	11	59
35,000	17	220	10	44	12	71
40,000	18	270	10	54	13	88
45,000	19	310	11	61	13	100
50,000	20	360	11	71	14	115
60,000	21	440	12	88	15	145
70,000	23	540	13	105	16	175
80,000	24	630	14	125	17	210
90,000	25	730	15	140	17	240
100,000	26	820	15	160	18	270
120,000	28	1,000	16	200	20	335
140,000	30	1,200	17	240	21	400
160,000	32	1,400	18	280	22	470
180,000	34	1,600	19	330	23	240
200,000	35	1,900	20	370	24	610
250,000	38	2,400	22	480	26	800
300,000	41	3,000	24	600	29	1,000
400,000	47	4,200	27	840	32	1,400
500,000	51	5,500	29	1,100	35	1,800

Bibliography

McConnell, Steve, Rapid Development,Microsoft Press,1996 Presents all the factor to achieve rapid development, from risk evaluation, good practices, classical mistake, etc … to team psychology or negociating. Really great. Boehm, Barry W., Software Engineering Economics,Englewood Cliffs N.J.: Prentice Hall 1981 COCOMO cost-estimation model, by its creator. DeMarco, Tom, Controlling Software Projects, New York: Yourdon Press,1982. Describes several estimation models. Putnam,Lawrence H., and Ware Myers. Measures of Excellence: Reliable Software on Time, Within Budget. Englewood Cliffs N.J.:Yourdon Press,1992. Presents a full-fleged software-project estimation.Explains how to calibrate a simple cost-estimation model to your organisation and how to use it to estimate medium to large projects. Jones,Capers. Assessment and Control of Software Risks. Englewood Cliffs N.J.:Yourdon Press,1994. Estimation,Project management. Gilb,Tom. Principles of Software Engineering Management. Workingham,Englang: Addison Wesley,1988. Practical advices for estimating software schedule.Focus on the importance of controlling the project to achieve your objectives rather than passive prediction about it. Dreger,Brian. Function Point Analysis,Englewood Cliffs N.J.: Prentice Hall 1989 Function Point Analysis. Jones,Capers. Applied Software Measurement:Assuring Productivity and Quality, New York:McGraw-Hill,1991. Function Point Analysis.

Links

class=liens>COCOMO II: from the roots…where it is developped. class=liens>IFPUG: Official Function Points web page . class=liens>Softwaremetrics: A good Function Points introduction. class=liens>A Function Points FAQ: Not IFPUG related .

Thomas Menguy, 2001. ISIA Student.