Class Hierarchies Benchmark

This project contains a large, real-world class hierarchy benchmark with 15246 classes and 63963 different derivation paths. The benchmark was previously used in the literature to study efficiency of type inclusion testing and dispatching techniques [1,2,3,4] and was maintained by Karel Driesen (see his original readme).

While working on Mach7 [5,6] we've tried to find this benchmark for our own experiments, but the original links were broken. Eventually we could dig up a version of it from archive.org, We revive it here for researchers to use, along with the translator to C++, we wrote for our own experiments.

COU to C++ Translator

While all the hierarchies were man-made, not all class hierarchies originated from C++, and sometimes used certain kinds of inheritance, not allowed in C++. This is why during conversion to C++, COU parser prunes inaccessible base classes (direct base class that is already an indirect base class) when used with repeated inheritance in order to satisfy semantic requirements of C++. It maintains the same number of virtual functions present in each class as well as the number of data members; the benchmarks, however, did not preserve the exact types of those.

Stats on Class Hierarchies in the Benchmark

The following table shows various parameters of the class hierarchies in each benchmark, after their adoption to C++:

Library	Language	Classes	Paths	Height	Roots	Leafs	Both	Prnt(avg)	Prnt(max)	Chld(avg)	Chld(max)	Name
DG2	Smalltalk	534	534	11	2	381	1	1	1	3.48	59	digitalk2
DG3	Smalltalk	1356	1356	13	2	923	1	1	1	3.13	142	digitalk3
ET+	C++	370	370	8	87	289	79	1	1	3.49	51	et++
GEO	Eiffel	1318	13798	14	1	732	0	1.89	16	4.75	323	geode
JAV	Java	604	792	10	1	445	0	1.08	3	4.64	210	java
LOV	Eiffel	436	1846	10	1	218	0	1.72	10	3.55	78	lov-object-editor
NXT	Objective-C	310	310	7	2	246	1	1	1	4.81	142	nextstep
SLF	Self	1801	36420	17	51	1134	0	1.05	9	2.76	232	self
UNI	C++	613	633	9	147	481	117	1.02	2	3.61	39	unidraw
		51	51	7	1	29	0	1.00	1	2.27	5	v1-collection
		18	18	5	1	11	0	1.00	1	2.43	5	v1-magnitude
		383	383	9	1	244	0	1.00	1	2.75	86	v1-object-nometa
		9	9	5	1	4	0	1.00	1	1.60	2	v1-set
		16	16	7	1	7	0	1.00	1	1.67	2	v1-stream
		53	53	8	1	31	0	1.00	1	2.36	7	v1-visualcomponent
VA2a	Smalltalk	3241	3241	14	1	2582	0	1	1	4.92	249	visualage2.all
VA2k	Smalltalk	2320	2320	13	1	1868	0	1	1	5.13	240	visualage2.kern
VW1	Smalltalk	387	387	9	1	246	0	1	1	2.74	87	visualworks1
VW2	Smalltalk	1956	1956	15	1	1332	0	1	1	3.13	181	visualworks2
		6	9	4	2	1	0	1.50	2	1.20	2	vtbl
	OVERALLS	15246	63963	17	298	10877	199	1.11	16	3.89	323	Overalls

The number of paths represents the number of distinct inheritance paths from the classes in the hierarchy to the roots of the hierarchy. This number reflects the number of possible subobjects in the hierarchy. The roots listed in the table are classes with no base classes. We will subsequently use the term "non-leaf" to refer to the possible root of a subhierarchy. Leafs are classes with no children, while "both" refers to utility classes that are both roots and leafs and thus neither have base nor derived classes. The average for the number of parents and the number of children were computed only among the classes having at least one parent or at least one child correspondingly.

71% of the classes in the entire benchmarks suite were leaf classes. Out of the 4369 non-leaf classes, 36% were spawning a subhierarchy of only 2 classes (including the root), 15% -- a subhierarchy of 3 classes, 10% of 4, 7% of 5 and so forth. Turning this into a cumulative distribution, M of subhierarchies had more than N classes in them:

M	1%	3%	5%	10%	20%	25%	50%	64%	100%
N	700	110	50	20	10	7	3	2	1

Which means 1% of subhierarchies had more than 700 classes in them, 3% of subhierarchies had more than 110 classes, 5% of subhierarchies had more than 50 classes, 10% of subhierarchies had more than 20 classes, 20% of subhierarchies had more than 10 classes, 25% of hierarchies had more than 7 classes, only 50% of hierarchies had more than 3 classes and 64% -- more than 2.

References

J. Vitek, R. N. Horspool, and A. Krall. Efficient type inclusion tests. In Proc. of the 12th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '97, pages 142-157, New York, NY, USA, 1997. ACM.
A. Krall, J. Vitek, and R. N. Horspool. Near optimal hierarchical encoding of types. In In Proc. European Conference on Object-Oriented Programming, ECOOP'97, Lecture Notes in Computer Science, pages 128-145. Springer-Verlag, 1997.
Y. Zibin and J. Y. Gil. Efficient subtyping tests with PQencoding. In Proc. of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA '01, pages 96-107, New York, NY, USA, 2001. ACM.
R. Ducournau. Perfect hashing as an almost perfect subtype test. ACM Trans. Program. Lang. Syst., 30(6):33:1-33:56, Oct. 2008. ISSN 0164-0925.
Y. Solodkyy, G. Dos Reis, and B. Stroustrup. Open and efficient type switch for C++. In Proc. OOPSLA'12, pp 963-982. ACM.
Y. Solodkyy, G. Dos Reis, and B. Stroustrup. Open pattern matching for C++. In Proc. GPCE'13. ACM, October 2013.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
cou		cou
parser		parser
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Class Hierarchies Benchmark

COU to C++ Translator

Stats on Class Hierarchies in the Benchmark

References

About

Releases

Packages

Languages

solodon4/COU

Folders and files

Latest commit

History

Repository files navigation

Class Hierarchies Benchmark

COU to C++ Translator

Stats on Class Hierarchies in the Benchmark

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages