Bart Kienhuis

Complex embedded systems are used to facilitate real-time data processing applications. The embedded systems we are interested in typically have to process large amount of data that represents, for example, medical images, astronomy data, network traffic, or even stock exchange information. To cope with their surroundings, these systems need to provide some form of dynamic parameterization. For example, to adjust parameters of filters, to switch in or out particular computational blocks or to select regions for further analysis; all at runtime. The development of such systems is challenging as a designer needs to develop a high-performing system that executes in a stable and consistent way with the parameterization. In this paper, we want to present how we can develop such a system using the Compaan technology and show a real-time medical X-Ray image processing case study. The Compaan technology realizes this design faster and more reliable than traditional design approaches that rely on more low level or adhoc development techniques to realize parameterization.

We can significantly reduce the time required to realize designs if it is possible to find limits to the performance of an embedded system, solely based on high-level system specifications. For that purpose, we present in this paper the cprof profiler, which determines the number of clock cycles needed to execute a C-program in hardware. The cprof tool is based on the Clang compiler front-end to parse C-programs and to produce instrumented source code for the profiling. Using cprof, we determine a lower and upper bound limit for all 29 cases of the PolyBench/C benchmark suite. The lower and upper bound are determined using the absolute performance estimations assuming all statement are mapped onto the same processing resource and unbounded performance estimations assuming unlimited resources. We also compared the clock cycles found by cprof with RTL implementations for all 29 Polybench/C cases and found that cprof determines with 1.2 procent accuracy the correct number of clock cycles. It does this in a fraction of the time compared to the time needed to do a full RTL simulation.

In this paper, we present a method to analyze different implementations of stream-based applications on heterogeneous multiprocessor systems. We take both resource usage and performance constraints into account. For the first aspect we use an empirical cost model. For the second aspect we build a network of cycle-accurate processor simulators. The simulation and resource cost estimation have been integrated in an existing framework, allowing one to generate fast exploration simulations, cycle-accurate simulations and FPGA implementations from a single system level specification. We show that with our methodology cycle-accurate performance numbers of candidate systems can be obtained. In our experiments with the QR and MJPEG applications, we found that the error of our resource cost model is below two percent.

The demand for embedded computing power is continuously increasing and FPGAs are becoming very interesting computing platforms, as they provide huge amounts of customizable parallelism. However, programming them is challenging, let alone from a high level language. The ESPAM methodology was already presented to quickly obtain realizations on FPGAs from sequential C code. The realization consists of a network of processors and IP cores. In this approach, a problem was that the IP cores had to be provided manually. In this paper, we present an extension on the ESPAM methodology by incorporating the industrial high level synthesis tool PICO from Synfora Inc. In this way, we realize the automated generation of efficient hardware implementations on FPGAs from a single sequential C input specification of a streaming application. We demonstrate our approach for the Sobel and QR applications.

Deadlock detection is a well-studied problem that may be considered solved from a theoretical point of view. However, specific cases may demand for specific solutions. One such specific case is deadlock detection in Kahn Process Networks. The Kahn process network (KPN) is an expressive model of computation that is widely used to model and specify deterministic streaming applications. The processes in the network communicate point-to-point over FIFO channels whose sizes are undecidable in general. As a consequence, deadlock may occur and, therefore, a run-time deadlock detection mechanism is required. This can be organized in a centralized way, a distributed way, and a hierarchical way. Centralized and distributed procedures have been reported in the literature. In this paper, we propose a novel hierarchical approach for KPN deadlock detection at run time. We also give results for the implementation on the IBM Cell processor.

At Leiden Embedded Research Center (LERC), we are building a tool chain called Compaan/Laura that allows us to map rapidly and efficiently signal processing applications written in Matlab onto reconfigurable platforms. In this chain, first the Matlab code is converted automatically to executable Kahn Process Network (KPN) specification, then a tool called Laura transforms the PN specification into design implementations described as synthesizable VHDL.
The applications targeted by Compaan are usually data-flow intensive, requiring large computational power. Therefore, an important issue in Laura is the derivation of efficient and scalable hardware control structures. This control is based on an abstract representation given as parametrized polytopes. Although this representation can be directly translated into nested guarded for-loops (very suitable for software implementation) its mapping to hardware is much more difficult. In this paper we investigate the opportunities of deriving different hardware realizations for this control, and explore the trade off between speed and resources usage.

The complexity of embedded applications has reached a point where the performance requirements of these applications can no longer be supported by embedded systems based on a single processor. Instead, emerging embedded System-on-Chip platforms are increasingly becoming multicore architectures. On these platforms two major problems emerge: how to design and how to program such multicore platforms in a systematic and automated way? We focus on the latter and are building compiler support for programming multicore platforms. For uniprocessor systems, auto parallelization techniques such as vectorization, instruction level parallelism, and software pipelining mainly focused on the back-end of the compiler. We believe that for multicore systems this is not sufficient and a difference can be made in the middle-end. Therefore, we present a compiler framework in which we use the Kahn Process Network (KPN) model of computation in the Intermediate Representation (IR) of the compiler. This IR model is crucial and allows us to focus on: 1) code generation for multicore platforms, and 2) KPN profiling and network restructuring transformations, where the former is a prerequisite for the latter. In this paper we focus on 1), and manually apply the network restructuring transformations. We demonstrate a working prototype compiler tool-chain that automatically creates multithreaded code from a sequential program specification and show results for the Cell processor.

When studying the IXP Network processor architecture from Intel, we found quite some interesting aspects that make the IXP attractive for stream-based applications. The architecture is highly optimized for streaming data, albeit in the form of Internet packets. Furthermore, the architecture has Gigabit Ethernet connectors for handling incoming and outgoing traffic and can process this data at real-time using dedicated microengines. In this paper, we try to answer three questions; 1) Can we use the IXP architecture for stream-based applications? 2) can we map applications written as a KPN onto an IXP? 3) can we integrate the generation of KPNs using the Compaan compiler with a tool flow that maps the KPN to an IXP, thereby make the programming of IXP much simpler? As will be shown, all three steps can be performed and we show that we can map automatically two non-Internet stream-based applications (QR and DWT) onto the IXP.

Kahn Process Networks (KPN) are very suitable for systematic mapping onto multiprocessor architectures. Running applications written in this parallel program specification on a multiprocessor architecture, however, does not guarantee that the runtime requirements are met. Therefore, it may be necessary to further analyze and optimize Kahn process networks. In this paper, we will present a process splitting technique that results in a functionally equivalent process network, but with a changed and optimized network structure. The class of networks that can be handled is generic as it is not restricted to static networks. The presented approach can also handle processes with dynamic program statements. We prototyped the splitting transformation in the GCC compiler for a JPEG decoder application. This resulted in a 21% performance improvement.

In this paper, we present a technique for compact representation of execution sequences in terms of efficient looping constructs. Here, by a looping construct, we mean a compact way of specifying a finite repetition of a set of execution primitives. Such compaction, which can be viewed as a form of hierarchical runlength encoding (RLE), has application in many VLSI signal processing contexts, including efficient control generation for Kahn processes on FPGAs, and software synthesis for static dataflow models of computation. In this paper, we significantly generalize previous models for loop-based code compaction of DSP programs to yield a configurable code compression methodology that exhibits a broad range of achievable trade-offs. Specifically, we formally develop and apply to DSP hardware and software synthesis a parameterizable loop scheduling approach with compact format, dynamic reconfigurability, and low-overhead decompression.

At Leiden University, we are developing a design methodology that allows for fast mapping of nested-loop applications (e.g. DSP, Imaging, or Multi-Media) written in a subset of Matlab onto reconfigurable devices. This design methodology is implemented into a tool chain that we call Compaan/Laura . This methodology generates a process network in which the inter-process communication takes place in a point-to-point fashion. Four types of point-to-point inter-processor communication exist in the PN. Two of them use a FIFO like communication and the other two use a cache like memory to exchange data. In this paper, we investigate the realizations for the four communication types and show that point-to-point communication at the level of scalars can be realized automatically and very efficiently in today's FPGAs.

The Compaan/Laura tool chain maps nested loop applications written in Matlab onto reconfigurable platforms, such as FPGAs. \Compaan rewrites the original Matlab application as a Process Network in which the control is parameterized and distributed. This control is given as parameterized polytopes that are expressed in terms of pseudo-linear expressions. These expressions cannot always be mapped efficiently onto hardware as they contain multiplication and integer division operations. This obstructs the data flow through the processes. Therefore, we present in this paper the Expression Compiler that efficiently maps pseudo-linear expressions onto reconfigurable hardware in such a way that the distributed and parameterized control never obstructs the data flow through processors. This compiler employs techniques like number theory axioms, method of difference, and predicated static single assignment code.

The Compaan compiler framework automates the transformation of DSP applications written in Matlab into Kahn Process Networks (KPNs). These KPNs express the signal processing applications in a parallel distributed way making them more suitable for mapping onto parallel architectures. A simple instance of a generated KPN by Compaan is a Producer process that communicates with a Consumer process via a FIFO buffer, with the Consumer reading data from the FIFO using a blocking read. When the sequence of producing data is different from the sequence of consuming data, a simple FIFO is not sufficient to implement the communication. For such case, extra storage and control are needed at the consumer side. This paper presents a compile approach that determines whether a FIFO buffer is sufficient for every Producer/Consumer pair of a Compaan-generated KPN. When additional memory is required, we provide an address generation mechanism to perform the reordering and furthermore give a lower bound on the amount of memory needed to perform the reordering. The presented approach is based on the Ehrhart theory.

Research has been carried out to explore if, and how, it is possible to increase the efficiency in which open source code can be used. The research has shown that structuring such code can improve this efficiency. It has lead to a methodology and a set of techniques, which can be used to (re) structure the code.

The Compaan compiler automatically derives a Process Network (PN) description from an application written in Matlab. The basic element of a PN is a Producer/Consumer (P/C) pair . Four different communication patterns for a P/C pair have been identified and the complexity of communication structure differs depending on the communication pattern involved. Therefore, in order to obtain cost-efficient process networks our compiler automatically identifies the communication pattern of each P/C pair. This problem is equivalent to integer linear programming and thus in general can not be solved efficiently. In this paper we present simpler techniques that allow to classify the interprocess communication in a PN. However, in some cases those techniques do not allow to find an answer and therefore, an ILP test has still to be applied. Thus, we introduce a hierarchical classification scheme that correctly classifies the interprocess communication, but uses dramatically less integer linear programming. In only 5% of the cases to classify, we still relay on integer linear programming while in the remaining 95%, the techniques presented in this paper are able to classify a case correctly.

New heterogeneous multiprocessor platforms are emerging that are typically composed of loosely coupled components that exchange data using programmable interconnections. The components can be CPUs or DSPs, specialized IP cores, reconfigurable units, or memories. To program such platform, we use the Process Network (PN) model of computation. The localized control and distributed memory are the two key ingredients of a PN allowing us to program the platforms. The localized control matches the loosely coupled components and the distributed memory matches the style of interaction between the components. To obtain applications in a PN format, we have built the Compaan compiler that translates affine nested-loop programs into functionally equivalent PNs. In this paper, we describe a novel analytical translation procedure we use in our compiler that is based on integer linear programming. The translation procedure consists of four main steps and we will present each step by describing the main idea involved, followed by a representative example.

At Leiden Embedded Research Center, we are building a tool chain called Compaan/Laura that allows us to do fast mapping of applications written in Matlab onto reconfigurable platforms, such as FPGAs, using IP cores to implement the data-path of the applications. A particular characteristic of the derived networks is the existence of selfloops. These selfloops have a large impact on the utilization of IP cores in the final hardware implementation of a PN, especially if the IP cores are deeply pipelined. In this paper, we present an exploration methodology that uses feedback provided by the Laura tool to increase the utilization of IP cores embedded in our PN network. Using this exploration, we go from 60MFlops to 1,7GFlops for the QR algorithm using the same number of resources except for memory.

New embedded signal processing architectures are emerging that are composed of loosely coupled heterogeneous components like CPUs or DSPs, specialized IP cores, reconfigurable units, or memories. We believe that these architectures should be programmed using the Process Network model of computation. To ease the mapping of applications, we are developing the Compaan compiler that automatically derives a Process Network (PN) description from an application written in Matlab. In this paper, we investigate a particular problem in nested loop programs, which is about classifying the interprocess communication in the PN representation of the nested loop program. The global memory arrays present in the Matlab code have to be replaced by a distributed communication structure used for sending data to the network processes. We will show that four types of communication exists, each exhibiting different requirements when realizing them in hardware of software. We present two compile time tests that decide the type of a particular static array. These tests have become an important part of our Compaan compiler.

Energy efficient embedded systems consist of a heterogeneous collection of very specific building blocks, connected together by a complex network of many dedicated busses and interconnect options. The trend to merge multiple functions into one device makes the design and integration of these "systems-on-chip" (SOC's) even more problematic. Yet, specifications and applications are never fixed and require the embedded units to be programmable. The topic of this paper is to give the designer architectures and design techniques to find the right balance between energy efficiency and flexibility. The key is to include programmability (or reconfiguration) at the right level of abstraction and tuned to the application domain. The challenge is to provide an exploration and programming environment for this heterogeneous architecture platform.

New emerging embedded system platforms in the realm of high-throughput multimedia, imaging, and signal processing will consist of multiple microprocessors and reconfigurable components. One of the major problems is how to program these platforms in a systematic and automated way so as to satisfy the performance need of applications executed on these platforms.In this paper, we present our system design approach as an efficient solution to this programming problem. We show how for an application written in Matlab, a Kahn Process Network specification can automatically be derived and systematically mapped onto a target platform composed of a microprocessor and an FPGA. Furthermore, we illustrate how the mapping approach is applied on a real-life example, namely an M-JPEG encoder.

At Leiden Embedded Research Center (LERC), we are building a tool chain called Compaan/Laura that allows us to map fast and efficiently application written in Matlab onto reconfigurable platforms. In this chain, first the Matlab code is converted automatically to executable Kahn Process Network (KPN) specification. Then a tool called Laura accepts this specification and transforms the specification into design implementations described as synthesizable VHDL. In this paper, we present our methodology implemented in Laura tool to automatically convert KPNs to synthesizable VHDL code targeted form mapping onto FPGA-based platforms. With the help of Laura , a designer is able to either fast prototype signal processing and multimedia applications directly in hardware or to extract very fast valuable low-level quantitative implementation data such as performance in terms of clock cycles or silicon area.

At the Leiden Embedded Research Center, we are developing a compiler called Compaan that automatically translates signal processing applications written in Matlab into Kahn Process Networks (KPNs). In general, these signal processing applications are data-flow intensive, requiring large storage capacities, usually represented by matrices. An important issue in Compaan is the derivation of a memory management mechanism that allows for efficient inter-process communication. This mechanism has previously been published and is called the Extended Linearization Model (ELM). The controller needed in the ELM is derived using the Ehrhart theory, leading to a computational intensive procedure. In this paper, we present a new approach to derive the ELM controller, based on the notion of Lexicographically Maximal Preimage. Using polytope manipulations and parametric integer linear programming techniques, we get less computational intensive and easier to be derived controller implementation for the ELM.

At the Leiden Embedded Research Center, we are working towards a framework called Compaan that automates the transformation of digital signal processing (DSP) applications to Kahn Process Networks (KPNs). These applications are written in Matlab as parameterized nested loop programs. This transformation is interesting as KPNs are well suited for mapping onto parallel architectures. Although the KPN semantic always assumes that FIFO buffers can be used between processes, we have found cases in which the FIFO is not enough as data may arrive in the wrong order. To solve this order problem, we previously presented the Extended Linearization Model (ELM) that describes a mechanism to reorder tokens. The introduction of the ELM does not violate the Kahn Process Network semantics; we still use a FIFO between a Producer and Consumer. The ELM relays on some additional memory and a controller to perform the reordering. The ELM model can be implemented in different ways. In this chapter, we investigate four different realizations of the ELM. The realizations differ in the computational complexity of performing the reordering, the kind of reordering memory used, and the size of the reordering memory.

Compaan is a software tool that is capable of automatically translating nested loop programs, written in Matlab, into parallel process network descriptions suitable for implementation in hardware. In this article, we show a methodology and tool to convert these process networks into FPGA implementations. We will show that we can in principle obtain high performing realizations in a fraction of the design time currently employed to realize a parameterized implementation. This allows us to rapidly explore a range of transformations, such as loop unrolling and skewing, to generate a circuit that meets the requirements of a particular application. The QR decomposition algorithm is used to demonstrate the capability of the tool. We present results showing how the number of clock cycles and calculations-per-second vary with these transformations using a simple implementation of the function units. We also provide an indication of what we expect to achieve in the near future once the tools are completed and applied the transformations to parallel, highly pipelined implementations of the function units.

The Compaan compiler framework automates the transformation of DSP applications written in Matlab into Kahn Process Networks (KPNs). These KPNs express the signal processing applications in a parallel distributed way making them more suitable for mapping onto parallel architectures. A simple instance of a generated KPN by Compaan is a Producer process that communicates with a Consumer process via a FIFO buffer, with the Consumer reading data from the FIFO using a blocking read. When the sequence of producing data is different from the sequence of consuming data, a simple FIFO is not sufficient to implement the communication. For such case, extra storage and control are needed at the consumer side. This paper presents a novel approach that determines at compile time whether a FIFO buffer is sufficient for every Producer/Consumer pair of a Compaan-generated KPN. For the case when the additional memory is required, we also provide an address generation mechanism at compile time. The presented approach is based on the Ehrhart theory.

Following the Y-chart paradigm for designing a system, an application and an architecture are modeled separately and mapped onto each other in an explicit design step. Next, a performance analysis for alternative application instances, architecture instances and mappings has to be done, thereby exploring the design space of the target system. Deriving alternative application instances is not trivially done. Nevertheless, many instances of a single application exist that are worth to be derived for exploration. In this paper, we present algorithmic transformation techniques for systematic and fast generation of alternative application instances that express task-level concurrency hidden in an application in some degree of explicitness. These techniques help a system designer to speedup significantly the design space exploration process.

Embedded systems architectures are increasingly becoming programmable , which means that an architecture can execute a set of applications instead of only one. This makes these systems cost-effective, as the same resources can be reused for another application by reprogramming the system. To design these programmable architectures, we present in this article a number of concepts of which one is the Y-chart approach. These concepts allow designers to perform a systematic exploration of the design space of architectures. Since this design space may be huge, it is narrowed down in a number of steps. The concepts presented in this article provide a methodology in which architectures can be obtained that satisfies a set of constraints while establishing enough flexibility to support a given set of applications.

Compaan is a software tool capable of automatically translating nested loop programs, written in Matlab, into parallel Kahn process network descriptions suitable for implementation in hardware. In this paper we present a tool for converting these process networks into FPGA implementations. The QR decomposition algorithm is used to demonstrate the capability of the tool to quickly generate high-performance parallel implementations. This allows us to rapidly explore a range of transformations, such as loop unrolling and skewing, to generate a circuit that meets the requirements of a particular application. We present results showing how the control logic complexity and number of clock cycles vary with these transformations

Modeling applications and architectures at various levels of abstraction is becoming more and more an accepted approach in embedded system design. When looking at the modeling of applications in the domain of video, audio, and graphics applications, we notice that they exhibit a high degree of task parallelism and operate on streams of data. Models that we can use to specify such stream-based applications on a high level of abstraction are the dataflow models and process network models. Each of these models has its own merits. Therefore, an alternative approach is to introduce a model of computation that combines the semantics of both models of computation. In this paper, we introduce such a model of computation, which we call the Stream-Based Functions (SBF) model of computation and show an example. Furthermore, we discuss the composition and decomposition of SBF objects and put the SBF model of computation in the context of relevant dataflow models and process network models.

High level modeling and (quantitative) performance analysis of signal processing systems requires high level models for the applications (algorithms) and the implementations (architectures), a mapping of the former into the latter, and a simulator for fast execution of the whole. Signal processing algorithms are very often nested-loop algorithms with a high degree of inherent parallelism. This paper presents - for such applications - a suitable application model and a method to convert a given imperative executable specification to a specification in terms of this application model. The methods and tools are illustrated by means of an example.

High level modeling and (quantitative) performance analysis of signal processing systems requires high level models for the applications (algorithms) and the implementations (architecture), a mapping of the former into the latter, and a simulator for fast execution of the whole. Signal processing algorithms are very often nested-loop algorithms with a high degree of inherent parallelism. This paper presents - for such applications - suitable application and implemetation models, a method to convert a given imperative executable specification to a specification in terms of the application model, a method to map this specification into an architecture specification in terms of the implementation model, and a method to analyze the performance through simulation. The methods and tools are illustrated by means of an example.

This paper presents the Compaan tool that automatically transforms a nested loop program written in Matlab into a process network specification. The process network model of computation fits better with the new emerging kind of embedded architectures that use coprocessors. Process networks can describe both fine-grained and coarse-grained parallelism, making the mapping of the applications easier.

This paper presents MatParser , which is an array analysis compiler that automatically converts an affine nested loop program into a single assignment program. The nested loop programs may contain non-linear operators like div/mod/floor/ceil and stepsizes other than one. The focus of this article is on the software architecture used in MatParser to resolve the data dependencies. Finding that two variables are dependent on each other and at which iteration, is a very computational intensive procedure. MatParser employs a particular linear programming technique to find the data-dependencies, based on parametric integer programming (PIP) as proposed by Paul Feautrier. To appreciate the implementation of MatParser, we will explain in this paper in enough detail the basics of the linear programming technique used.

In wireless communications of tomorrow, high-resolution space-time filtering techniques will play a crucial role. To make these techniques broadly available, low-power, low-cost embedded processors for efficient implementation must be designed. As many of these space-time filters implement routines that are built on robust Jacobi-type algorithms, it makes sense to design a processor with a good performance-cost ratio over a subset of the Jacobi-type algorithms. This paper summarizes the steps in a design procedure that takes as its input executable specifications in Matlab of a subset of Jacobi-type algorithms and outputs one or more candidate processors for efficient execution of these algorithms. The procedure heavily relies on models of computation and models of realization to specify algorithm and architecture instances for evaluation of performance and cost metrics through extensive simulation. Details of the various steps can be found in the list of references.

Novel high-performance, domain specific, embedded architectures are more and more composed of a microprocessor, some memory, and a number of dedicated coprocessors. Onto these embedded architectures, applications will be executed that belong to the domain of multi-media processing, mobile communication, and adaptive array processing. In general, these applications are written using an imperative model of computation most commonly C or Matlab. A better specification format would be to use an inherent parallel model of computation like Process Networks. This paper presents a three-step approach to automatically transform a class of Matlab specification into a process network specification.

Systems in the domain of high-performance video signal processing are becoming more and more programmable. We suggest an approach to design such systems that involves measuring, via simulation, the performance of various architectures on which a set of applications are mapped. This approach requires a retargetable simulator for an architecture template. We describe the retargetable simulator that we constructed for a stream-oriented application-specific dataflow architecture. For each architecture instance of the architecture template, a specific simulator is derived in three steps: the architecture instance is constructed, an execution model is added, and the executable architecture is instrumented to obtain performance numbers. We used object oriented principles together with a high-level simulation mechanism to ensure retargetability and an efficient simulation speed. Finally, we explain how a retargetable simulator can be encapsulated within an environment for automated design space exploration.

We explore the efficiency of a class of stream-oriented dataflow architectures as a function of the grain-size, for a given set of applications. We believe the grain-size is a key parameter in balancing flexibility and efficiency of this class of architectures. We apply a clustering approach on the well-defined set of applications to determine a set of weakly programmable processing elements of different grain-sizes. The resulting architectures are compared with respect to their silicon area. For a set of twenty-one industrially relevant video algorithms, we determined architectures with various grain- sizes. The results of this exercise indicate an improvement factor of two for the silicon area, while changing the grain-size from fine-grain to coarser- grain.

In this paper we present an approach for quantitative analysis of application- specific dataflow architectures. The approach allows the designer to rate design alternatives in a quantitative way and therefore supports him in the design process to find better performing architectures. The context of our work is Video Signal Processing algorithms which are mapped onto weakly- programmable, coarse-grain dataflow architectures. The algorithms are represented as Kahn graphs with the functionality of the nodes being coarse- grain functions. We have implemented an architecture simulation environment that permits the definition of dataflow architectures as a composition of architecture elements, such as functional units, buffer elements and communication structures. The abstract, clock-cycle accurate simulator has been built using a multi-threading package and employs object oriented principles. This results in a configurable and efficient simulator. Algorithms can subsequently be executed on the architecture model producing quantitative information for selected performance metrics. Results are presented for the simulation of a realistic application on several dataflow architecture alternatives, showing that many different architectures can be simulated in modest time on a modern workstation.

This paper describes the conversion of nested loop programs into single assignment forms. The nested loop programs may contain the integer operators: integer division , floor , ceil , and modulo , in expressions and the stride, or step size, of for loops may be greater than one. The programs may be parametrized but must have static control. The conversion is done completely automatical by the tool HiPars . The output of HiPars is a single assignment program (SAP) and a dependence graph (DG). The description of the dependence graph is based on linearly bounded lattices. We will show the relation between the integer division operators in the SAP and linearly bounded lattices in the corresponding DG.

Bart Kienhuis

Vital Statistics

Bio

Master and PhD projects

Involved Research Projects (Dutch/European)

Presentations:

Publications:

Previous Research Work: