From nlrp Benchmark
Jump to: navigation, search


The Research Field: Requirements Engineering Described

A typical requirements engineering process is depicted in the following figure.


  • The requirements analyst is responsible for coordinating these processes while improving and documenting the requirements. This approach is iterative and has the following steps
  1. Discuss the requirements
  2. Elicitate user needs, document requirements, verify requirements
  3. Created (semi-formal) model of requirements
  4. Hand-over model to software architect
  5. Architect makes changes on the model
  6. Changes have to be re-discussed with the stakeholder and changes need to be checked for compliance with the customer's wishes.
  • Requirements are often visualized using domain models which are used for the software development process.
  • UML domain models are created from natural language requirements specifications.
  • The analyst then uses domain models to verify and rectify the stakeholders' input which was gathered in discussions and interviews.
  • These models are the blueprint for the software architect who might make changes to the models himself. These changes are usually not automatically fed back to the customer.

NLRP-bench: The Natural Language Requirements Processing Benchmark

„What we cannot speak about we must pass over in silence.“, Ludwig Wittgenstein (1889-1951), Austrian Philosopher

  • Benchmarking means measuring the accuracy, recall, and precision with which a computer system will execute a computing task, in a way that will allow comparison between different hard/software combinations.
  • Benchmarking is a tedious, repetitive task, and takes attention to details and subject to interpretation.

We believe that benchmarks for requirements engineering tools can substantially accelerate progress in research, because they provide a basis for comparison and competition:

  • Benchmarks apply to any tool that automates an aspect of software engineering.
  • Benchmarks share the work on developing a wider range of meaningful and challenging benchmarks,
    • so better tools can be built,
    • we know which techniques work best,
    • progress accelerates.

The Code of Conduct

We want to create a community in benchmarking natural language requirements processing. Therefore we invite everybody to submit examples, solutions, and explanation of benchmarks you can share with us. We will name your institution and field of study and also link to your research, unless you wish otherwise. If you have any questions, hints, ideas, please contact us.

The NLRP Benchmark

Why We Need This Benchmark

  • The majority (72%) of requirements is written in unrestricted, natural language.
  • For progress, one obviously needs real requirements.
  • Real requirements are surprisingly hard to find:
    • Textbooks contain few examples, and they seem to be written by the authors or copied from other authors, who also made them up.
    • The examples about NLP requirements processing use artificial, strongly restricted language.
    • Companies don’t want to provide samples (we managed to get 4).
  • Without real-life requirements texts, progress is difficult.
  • With them, we might be able to tackle one problem after another.

What Are Benchmarks

  • Benchmarks are sets of problems with a quality metric for potential solutions.
  • Independent teams apply their automated “solvers” to the problem and the quality of the solutions can be compared.
  • Benchmarks have a tremendous advantage over experiments with human subjects: they can be repeated as often as necessary, usually at moderate cost.
  • Setting up a benchmark is usually not for free: data has to be collected, benchmark programs have to be prepared.
  • However, this cost can be amortized over many trials and provides a basis for comparison.
  • Over time, the benchmark must evolve (become harder, more general, avoid over fitting.)

Successful Benchmark Examples

  • Computer architecture: Various benchmarks have been used for decades in order to compare processor performance.
    • The Standard Performance Evaluation Corporation (SPEC) publishes benchmarks to evaluate a range of performance criteria (CPU, Web server, Mail Server, AppServer, power consumption, etc.)
    • Benchmarks combined with simulation have made computer architecture research quantitative.
    • Every performance feature must be substantiated on relevant benchmarks.
  • Databases: Transaction Processing Performance Council (TPC)
  • Speech recognition: large databases of speech samples are used in competitions to determine the best speech recognizer. Here, the issue is not speed, but error rate.
  • Speech translation: same idea.
  • Robotics: DARPA Grand Challenge for driverless vehicles (2004, 2005), and the DARPA Urban Challenge (2007).

In all of these cases, benchmarks resulted in swift and substantial progress. The winning techniques were quickly adopted by other teams and improved upon. How could we achieve comparable speed in software research?


  • The use of benchmarks in software research is not as high as it could be.
  • All areas of software engineering could benefit: requirements, design, implementation, testing, maintenance.
  • With realistic benchmarks, one gets reliable and testable results.
  • Benchmarks accelerate progress: they eliminate inferior choices quickly, help concentrate on the challenges.
  • Share the work of preparing benchmarks among interested groups.
  • With a concentrated effort in benchmarking, we might speed up tool research dramatically.
  • When tool progress has been made, check usability with human subjects (the expensive experiment).
Personal tools