Warning: Undefined array key "HTTP_ACCEPT_LANGUAGE" in E:\rd_system\apps\Apache24\htdocs\search\index.php on line 12
(Faculty Division of Human Life and Environmental Sciences Research Group of Information and Communication Technology for Life)|Researchers' Profile Teacher performance management system

MATSUMOTO Takashi

Faculty Division of Human Life and Environmental Sciences Research Group of Information and Communication Technology for LifeProfessor
Last Updated :2025/04/27

■researchmap

Profile Information

  • Name (Japanese)

    Matsumoto
  • Name (Kana)

    Takashi

Degree

  • The University of Tokyo, Sep. 2001

Research Interests

  • deep learning
  • Memory-Based Communication Facility
  • distributed processing
  • parallel processing
  • operating system
  • computer architecture

Research Areas

  • Informatics, Intelligent informatics, Deep Learning
  • Informatics, Computer systems

Research History

  • Jul. 2013 - Present, Nara Women's University, Life Computing and Communication Technology, Human Life and Environmental Science, Professor, Japan
  • Dec. 2002 - Present, Information Science Laboratory Ltd., 代表取締役副社長(成果活用兼業), 大学等発ベンチャー企業, Japan
  • Jun. 2011 - Jun. 2013, CANON IMAGING SYSTEMS, Senior Staff Engineer, Japan
  • Apr. 2002 - May 2011, National Institute of Informatics, Associate Professor, Japan
  • Nov. 1991 - Mar. 2002, Department of Information Science, University of Tokyo, Assistant Professor
  • Apr. 1987 - Oct. 1991, IBM Japan Tokyo Research Laboratory, Researcher, Japan

Education

  • Sep. 2001 - Sep. 2001, The University of Tokyo, Graduate School of Science, 博士(理学), 論文博士, 論文博士, Japan
  • Apr. 1985 - Mar. 1987, Osaka City University, 大学院理学研究科, 物理学専攻, Japan
  • Apr. 1981 - Mar. 1985, The University of Tokyo, 工学部, 計数工学科数理コース, Japan

Teaching Experience

  • Oct. 2018 - Present
  • Oct. 2018 - Present
  • Apr. 2018 - Present
  • Apr. 2018 - Present
  • Apr. 2016 - Present
  • Apr. 2016 - Present
  • Oct. 2015 - Present
  • Oct. 2015 - Present
  • Oct. 2014 - Present
  • Apr. 2014 - Present
  • Oct. 2017 - Mar. 2018
  • Sep. 2013 - Mar. 2018
  • Oct. 2014 - Mar. 2015

Professional Memberships

  • 情報処理学会
    Jan. 2020 - Present
  • 奈良女子大学家政学会
    Jul. 2013 - Present

■Ⅱ.研究活動実績

Published Papers

  • Not Refereed, Trend extraction from Instagram images by deep learning -Aim to detect new trends-, Kaho Ito; Takashi Matsumoto, Mar. 2023, 2023-MPS-142, 11, 1, 6, Symposium
  • Not Refereed, ICBI-Based Real-Time Super-Resolution System with Variable Magnification, Mayu Kondo; Takashi Matsumoto, Dec. 2022, 2022-MPS-141, 7, 1, 6, Symposium
  • Not Refereed, Basic Performance Reports of Memory-Based Communication Facility in the Linux kernel, Aya Oya; Yukino Inomata; Keiji Uemura; Takashi Matsumoto, Mar. 2022, 2022-SLDM-198, 49, 1, 7, Symposium
  • Not Refereed, Memory-Based Communication Facility in the Linux kernel, Takashi MATSUMOTO, Nov. 2021, 2021, 27, 36, Symposium
  • Not Refereed, Toward Expanding the Variety of Programming Fields for Elementary School Students, Hina AKAHORI; Takashi MATSUMOTO, Nov. 2021, 2021-MPS-136, 13, 1, 6, Symposium
  • Not Refereed, Feb. 2020, 2020-MPS-127, 13, 1, 6, Symposium
  • Not Refereed, Feb. 2020, 2020-MPS-127, 15, 1, 6, Symposium
  • Not Refereed, Refinement of a real-time super-resolution FPGA circuit, Takashi Matsumoto; Mayo Sanada; Suzuka Yasunami; Kazuki Joe, Jul. 2018, 2018-MPS-119, 13, 1, 5, Symposium
  • Refereed, Proceedings of 2018 International Conference on Parallel and Distributed Processing Techniques and Applications, Refinement of a real-time super-resolution FPGA circuit, Takashi Matsumoto; Mayo Sanada; Suzuka Yasunami; Kazuki Joe, Jul. 2018, 347, 353, International conference proceedings
  • Not Refereed, Feb. 2018, 2018-MPS-117, 14, 1, 6, Symposium
  • Refereed, Oct. 2017, 2017, 別冊, 225, 228, Research institution
  • Not Refereed, Jul. 2016, 2016-MPS-109, 11, 1, 4, Symposium
  • Refereed, Proceedings of 2016 International Conference on Parallel and Distributed Processing Techniques and Applications, Real-Time Super Resolution: FPGA Implementation for the ICBI Algorithm, Takashi Matsumoto; Arisa Yamamoto; Kazuki Joe, Jul. 2016, 415, 420, International conference proceedings
  • Not Refereed, Mar. 2014, 2014-SLDM-165, 27, 1, 6, Symposium
  • Refereed, Apr. 2001, 42, 4, 879, 897, Scientific journal
  • Refereed, Dissertation Thesis, Graduate School of Science, Univ. of Tokyo, A Study on Memory-Based Communications and Synchronization in Distributed-Memory Systems, Matsumoto, T, Feb. 2001, Research institution
  • Refereed, Proc. of the 9th Workshop on Scalable Shared Memory Multiprocessors, On Scalability Issue of Directory Schemes of Hardware Distributed Shared Memory., Tanaka, K; Matsumoto, T; Hiraki, K, Jun. 2000, International conference proceedings
  • Refereed, Proc. of the 2000 Int. Conf. on Supercomputing (ICS’00), ACM press, Comparative Study of Page-based and Segment-based Software DSM through Compiler Optimization, Niwa, J; Matsumoto, T; Hiraki, K, May 2000, 284, 295, International conference proceedings
  • Refereed, Proc. of Int. Workshop. on Innovative Architecture for Future Generation High Performance Processors and Systems (IWIA’99) IEEE Computer Society Press, Evaluation of Compiler-Assisted Software DSM Schemes for a Workstation Cluster., Niwa, J; Inagaki, T; Matsumoto, T; Hiraki, K, 2000, 57, 68, International conference proceedings
  • Refereed, May 1999, 40, 5, 2256, 2268, Scientific journal
  • Refereed, May 1999, 40, 5, 2025, 2036, Scientific journal
  • Refereed, Proceedings - 6th International Conference on Real-Time Computing Systems and Applications, RTCSA 1999, Institute of Electrical and Electronics Engineers Inc., On the schedulability conditions on partial time slots, S. Shirero; M. Takashi; H. Kei, A real-Time round robin, which is a novel real-Time scheduling algorithm, is proposed in this paper. It is a time slot-based algorithm. Tasks are divided into groups and each group of tasks is statically assigned a subset of time slots. In a group, tasks are scheduled by earliest deadline first (EDF). We introduce "regular" subsets of time slots. This has the advantage that any periodic tasks can be scheduled only at time slots contained in the subset using the minimum number of time slots. We show a method to divide the universal set of time slots into at least two regular subsets. Consequently, the real-Time round robin algorithm can schedule periodic tasks whose processor utilization factor does not exceed 100% at a lower scheduling cost than that of the EDF algorithm. Moreover, no missed deadline of a task in one group affect the tasks in any other groups., 1999, 166, 173, International conference proceedings, 10.1109/RTCSA.1999.811212
  • Refereed, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, Performance evaluation of MPI/MBCF with the NAS parallel benchmarks, Kenji Morimoto; Takashi Matsumoto; Kei Hiraki, MPI/MBCF is a high-performance MPI library targeting a cluster of workstations connected by a commodity network. It is implemented with the Memory-Based Communication Facilities (MBCF), which provides software mechanisms for users to access remote task’s memory space with off-the-shelf network hardware. MPI/MBCF uses Memory-Based FIFO for message buffering and Remote Write for communication without buffering from among the functions of MBCF. In this paper, we evaluate the performance of MPI/MBCF on a cluster of workstations with the NAS Parallel Benchmarks. We verify whether a message passing library implemented on the shared memory model achieves higher performance than that on the message passing model., 1999, 1697, 19, 26, International conference proceedings, 10.1007/3-540-48158-3_3
  • Refereed, Proc. of The Fifth Int. Symp. on High Performance Computer Architecture (HPCA5), Lightweight Hardware Distributed Shared Memory Supported by Generalized Combining, Kiyofumi Tanaka; Takashi Matsumoto; Kei Hiraki, Jan. 1999, 90, 99, International conference proceedings
  • Refereed, Proc. of The International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA-98),, Run-time Loop Restructuring for On-Chip Parallel Processor., Tamatsukuri, J; Matsumoto, T; Hiraki, K, Jul. 1998, 3, 1489, 1496, International conference proceedings
  • Refereed, Proc. of The International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA-98), Compiler-Assisted Distributed Shared Memory Schemes Using Memory-Based Communication Facilities, Matsumoto, T; Niwa, J; Hiraki, K, Jul. 1998, 2, 875, 882, International conference proceedings
  • Refereed, Proc. of the 1998 ACM Int. Conf. on Supercomputing, Speculative execution model with duplication, Hiraki, K; Tamatsukuri, J; Matsumoto, T, Jul. 1998, 321, 328, International conference proceedings
  • Refereed, Proc. of the 1998 ACM Int. Conf. on Supercomputing, MBCF: A Protected and Virtualized High-Speed User-Level Memory-Based Communication Facility, Matsumoto, T; Hiraki, K, Jul. 1998, 259, 266, International conference proceedings
  • Refereed, Jun. 1998, 39, 6, 1729, 1737, Scientific journal
  • Refereed, Jun. 1998, 39, 6, 1738, 1745, Scientific journal
  • Refereed, May 1998, 15, 3, 54, 58, Scientific journal
  • Refereed, May 1998, 15, 3, 59, 63, Scientific journal
  • Refereed, Proc. of The 20th Int. Conf. on Software Engineering, A general-purpose scalable operating system: SSS-CORE, Matsumoto, T; Uzuhara, S; Hiraki, K, Apr. 1998, 2, 147, 152, International conference proceedings
  • Not Refereed, Architecture for Future Generation High Performance Processors and Systems, IEEE Computer Society,, Memory-Based Communication Facilities and Asymmetric Distributed Shared Memory., Matsumoto, T; Hiraki, K, Apr. 1998, 30, 39, International conference proceedings
  • Refereed, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag, Implementing MPI with the memory-based communication facilities on the SSS-CORE operating system, Kenji Morimoto; Takashi Matsumoto; Kei Hiraki, This paper describes an efficient implementation of MPI on the Memory-Based Communication Facilities
    Memory-Based FIFO is used for buffering by the library, and Remote Write for communication with no buffering. The Memory-Based Communication Facilities are software-based communication mechanisms, with off-the-shelf Ethernet hardware. They provide low-cost and highly-functional primitives for remote memory accesses. The performance of the library was evaluated on a cluster of workstations connected with a 100Base-TX network. The round-trip time was 71μs for 0 byte message, and the peak bandwidth was 11.86 Mbyte/s in full duplex mode. These values show that it is more efficient to realize the message passing libraries with the shared memory model than with the message passing model., 1998, 1497, 223, 230, International conference proceedings, 10.1007/BFb0056579
  • Refereed, Proceedings of the International Conference on Parallel Processing, Institute of Electrical and Electronics Engineers Inc., Supporting software distributed shared memory with an optimizing compiler, Tatsushi Inagaki; Junpei Niwa; Takashi Matsumoto; Kei Hiraki, To execute a shared memory program efficiently, we have to manage memory consistency with low overheads, and have to utilize communication bandwidth of the platform as much as possible. A software distributed shared memory (DSM) can solve these problems via proper support by an optimizing compiler. The optimizing compiler can detect shared write operations, using interprocedural points-to analysis. It also coalesces shared write commitments onto contiguous regions, and removes redundant write commitments, using interprocedural redundancy elimination. A page based target software DSM system can utilize communication bandwidth, owing to coalescing optimization. We have implemented the above optimizing compiler and a run time software DSM on AP1000+. We have obtained a high speed-up ratio with the SPLASH-2 benchmark suite. The result shows that using an optimizing compiler to assist a software DSM is a promising approach to obtain a good performance. It also shows that the appropriate protocol selection at a write commitment is an effective optimization., 1998, 225, 235, International conference proceedings, 10.1109/ICPP.1998.708490
  • Refereed, Proc. of Int. Symp. on Parallel Architectures, Algorithms and Networks (I-SPAN’97), Efficient Implementation of Software Release Consistency on Asymmetric Distributed Shared Memory, Niwa, J; Inagaki, T; Matsumoto, T; Hiraki, K, Dec. 1997, 198, 201, International conference proceedings
  • Refereed, Nov. 1997, 101, 108, Symposium
  • Refereed, Proc. of International Symposium on High Performance Computing, Springer-Verlag LNCS 1336, Resource Management Methods for General Purpose Massively Parallel OS SSS-CORE, Nobukuni, Y; Matsumoto, T; Hiraki, K, Nov. 1997, 255, 266, International conference proceedings
  • Refereed, Proc. of the 1997 ACM Int. Conf. on Supercomputing, An I/O Network Architecture of the Distributed Shared-Memory Massively Parallel Computer JUMP-1, Nakajo, H; Ohtani, S; Matsumoto, T; Kohata, M; Hiraki, K; Kaneda, Y, Jul. 1997, 253, 260, International conference proceedings
  • Refereed, May 1997, 21, 28, Symposium
  • Refereed, Nov. 1996, 37, 44, Symposium
  • Refereed, Jul. 1996, 37, 7, 1429, 1439, Scientific journal
  • Refereed, Proc. of Second Int. Symp. on Parallel Architectures, Algorithms and Networks (I-SPAN’96), IEEE Computer Society, Distributed Shared Memory Architecture for JUMP-1: A General-Purpose MPP Prototype, Matsumoto, T; Nishimura, K; Kudoh, T; Hiraki, K; Amano, H; Tanaka, H, Jun. 1996, 131, 137, International conference proceedings
  • Refereed, Proc. of 7th IASTED-ISMM Int. Conf. on Parallel and Distributed Computing and Systems, High Performance I/O System of the Distributed Shared-Memory Massively Parallel Computer JUMP-1, Nakajo, H; Matsumoto, T; Kohata, M; Matsuda, H; Hiraki, K; Kaneda, Y, Nov. 1995, 470, 473, International conference proceedings
  • Refereed, Proc. of the 1995 Int. Conf. on Parallel Processing, Hierarchical bit-map directory schemes on the RDT interconnection network for a massively parallel processor JUMP-1, Kudoh, T; Amano, H; Matsumoto, T; Hiraki, K; Yang, Y; Nishimura, K; Yoshimura, K; Fukushima, Y, Aug. 1995, 1, 186, 193, International conference proceedings
  • Refereed, Jul. 1995, 36, 7, 1652, 1661, Scientific journal
  • Refereed, May 1995, 67, 74, Symposium
  • Refereed, Proc. Intl. Symp. Parallel Architectures, Algorithms and Networks, Overview of the JUMP-1, an MPP Prototype for General-Purpose Parallel Computations, Kei Hiraki; Hideharu Amano; Morihiro Kuga; Toshinori Sueyoshi; Tomohiro Kudoh; Hiroshi Nakashima; Hironori Nakajo; Hideo Matsuda; Takashi Matsumoto; Shin-ichiro Mori, We describe the basic architecture of JUMP-1, an MPP prototype developed by collaboration between 7 universities. The proposed architecture can exploit high performance of coarse-grained RISC processor performance in connection with flexible fine-grained operation such as distributed shared memory, versatile synchronization and message communications., Dec. 1994, 427, 434, International conference proceedings
  • Not Refereed, Proc. of IEEE Region 10’s Ninth Annual Int. Conf. (TENCON), Complementary Hybrid Architecture with Two Different Processing Elements with Different Grain Size, Hiraki, K; Matsumoto, T, Aug. 1994, 1, 324, 331, International conference proceedings
  • Refereed, Jun. 1994, 289, 302, Symposium
  • Refereed, May 1994, 349, 356, Symposium
  • Not Refereed, May 1994, 409, 418, Symposium
  • Refereed, May 1994, 137, 144, Symposium
  • Refereed, May 1994, 73, 80, Symposium
  • Refereed, Jan. 1994, 35, 1, 92, 101, Scientific journal
  • Refereed, Proc. of the 1993 ACM Int. Conf. on Supercomputing,, Dynamic Switching of Coherent Cache Protocols and its Effects on Doacross Loops, Matsumoto, T; Hiraki, K, Jul. 1993, 328, 337, International conference proceedings
  • Refereed, May 1993, 245, 252, Symposium
  • Refereed, Apr. 1993, 34, 4, 616, 627, Scientific journal
  • Refereed, Journal of Information Processing, IPS Japan, Information Processing Society of Japan (IPSJ), Efficient Execution of Fine-Grain Parallelism on a Tightly-Coupled Multiprocessor., Matsumoto, T, In multiprocessor systems, the overheads caused by inter-processor synchronization and communication continue to be impediments to the efficient execution of parallel programs. Reduction of these types of overhead is necessary in systems that focus on large-scale and fine-grain parallelism. This paper proposes a Fine-Grain Multi-Processor (FGMP) based on a shared-memory/shared-bus architecture, which can efficiently handle fine-grain concurrency in parallel. New strategies for management of hardware resources in the system are discussed, and two innovative hardware mechanisms are proposed that work well for fine-grain parallelism with the above strategies: Elastic Barrier (a light synchronization mechanism), which is derived from a generalization of a barrier-type mechanism, and an Inter-Cache Snoop Control Mechanism that switches snoop-protocols dynamically to reduce the overhead associated with shared data handling. After introducing the FGMP system, which incorporates the above strategies and mechanisms, the paper closes with a discussion of the FGMP's characteristics and efficiency., Nov. 1992, 15, 3, 474, 484, Scientific journal
  • Refereed, Jun. 1992, 375, 382, Symposium
  • Refereed, Jun. 1992, 297, 304, Symposium
  • Refereed, Nov. 1991, 191, 200, Symposium
  • Refereed, Proc. of the 1991 Int. Conf. on Parallel Processing, MISC: A Mechanism for Integrated Synchronization and Communication Using Snoop Caches, Matsumoto, T; Tanaka, T; Moriyama, T; Uzuhara, S, Aug. 1991, 1, 161, 170, International conference proceedings
  • Refereed, Jul. 1991, 32, 7, 886, 896, Scientific journal
  • Refereed, Dec. 1990, 31, 12, 1840, 1851, Scientific journal
  • Refereed, May 1990, 49, 56, Symposium
  • Not Refereed, 3- 2025, 2025-EMB-68, 15, 1, 6
  • Not Refereed, 3- 2025, 2025-ARC-260, 4, 1, 6
  • Not Refereed, 2- 2024, 2024-MPS-151, 11, 1, 6
  • Not Refereed, 2- 2024, 2024-MPS-151, 14, 1, 6
  • Not Refereed, 2- 2024, 2024-MPS-151, 15, 1, 6
  • Not Refereed, 1- 2023, 2023, 21, 28
  • Not Refereed, 1- 2023, 2023, 29, 35
  • Not Refereed, 2- 2023, 2023-MPS-146, 15, 1, 6
  • Not Refereed, 2- 2021, 2021-MPS-136, 13, 1, 6
  • Not Refereed, 3- 2022, 2022-ARC-248, 48, 1, 7
  • Not Refereed, 3- 2022, 2022-ARC-248, 49, 1, 7
  • Not Refereed, 1- 2021, 2021, 27, 36
  • Not Refereed, 2- 2023, 2023-MPS-146, 20, 1, 6
  • Not Refereed, 2- 2023, 2023-MPS-146, 6, 1, 7
  • Not Refereed, 2- 2023, 2023-MPS-146, 5, 1, 7
  • Not Refereed, 2- 2024, 2023-MPS-147, 25, 1, 7

MISC

  • IEICE technical report. Dependable computing, The Institute of Electronics, Information and Communication Engineers, Definition and potential trends of CSoC (Configurable System-on-Chip), MATSUMOTO TAKASHI; JOE KAZUKI, In those days when SoC(System-on-Chip) is to be practical by enough large integration degree of LSIs, the development of ASIC(Application Specific IC), which allows differentiation by product or company, was still shriving. Such trends gave Japanese semiconductor industries a prediction that SoC type ASIC development would spread to be accelerated, and the ASIC development was widely encouraged. However, it is ASSP (Application Specific Standard Product) for general customers that is survived among various SoCs because of soaring development costs for highly integrated LSIs. In this way, the current hot LSI target is ASSP as SoCs. Many products of the ASSP SoCs provide much more functions on a chip than corresponding to the number of external pins so that users can select the functions they need and/or the roles of the external pins they use. We call such ASSP SoC with the above features CSoC. In this paper we describe the necessity of the CSoC survived in semiconductor products, and discuss the future of the CSoC., 15 Mar. 2014, 113, 498, 157, 162
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Binary Translation for Run-time Restructuring, TAMATSUKURI Junji; MATSUMOTO Takashi; HIRAKI Kei, Runtime Restructuring executes sequential programs in parallel with executing and analyzing reconstructing their self. We propose the optimization of runtime restructuring by binary translation for more effective parallel execution. The binary translation mechanism only needs the speculative instructions to use runtime restructuring hardware. It analyzes program, then its controls and memory accesses with parallelism are replaced with speculative instructions. The effective translation decreases the overhead of the runtime analysis and derives much more performance the sequential program contains potentially., 18 Jul. 2001, 101, 216, 55, 62
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Methods of Reducing Cache - miss Traffic at Fetch -on- write in Software DSM, NIWA JUMPEI; MATSUMOTO TAKASHI; HIRAKI KEI, In the compiler-assisted software DSM scheme, an optimizing compiler can analyze data access patterns and eliminate coherence management operations for blocks whose data are written but not read. As a result, the run-time system need not fetch data to update the blocks. We have implemented this optimizing technique in the optimizing compiler called "Remote Communication Optimizer"(RCOP). The experimental results using the SPLASH-2 benchmark suite on the SS20 cluster show that this approach is effective., 08 Dec. 2000, 2000, 114, 49, 54
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Performance Evaluation of Load Balancing in HTTP Server Using Resource Information on General - Purpose Cluster, ODAIRA REI; MATSUMOTO TAKASHI; HIRAKI KEI, An HTTP server on general-purpose cluster consisting of combined workstations runs on fixed number of nodes(fixed nodes)when the number of requests per second is constant. However, when the number of requests suddenly increases, the server should cope with them by dynamically utilizing machines which are not the member of the fixed nodes. In this paper, we describe the modle of HTTP server that uses resource information provided by OS on general-purpose cluster, and determines whether to adjust the number of nodes in response to change of CPU load on each node. Then, by using simulation, we experiment with how the load-balancing performance of the server model changes by the threshold of load on which decision of adjustment depends, and by the frequency of decision. Simulation results show that the dynamic nodes method has better performance than the fixed nodes method by choosing optimal parameters., 04 Aug. 2000, 2000, 75, 31, 38
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Quantitative Evaluation of Scalable Directory Schemes in Hardware Distributed Shared Memory, TANAKA KIYOFUMI; MATSUMOTO TAKASHI; HIRAKI KEI, From the implementation of the hardware DSM system on the prototype machine, various values were obtained, such as the time required for a message to pass through a switch. In this paper, coherence processing(invalidation)on a large-scale system is considered in terms of the obtained values, and the hierarchical coarse directory with multicasting and combining is compared with the full-map directory. Moreover, we consider the size of memory required for the directories and network traffic which the structure of the directories causes., 03 Aug. 2000, 2000, 74, 7, 12
  • Mar. 2000, 31, 4, 1, 2
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Casablanca : Design and Implementation of Realtime RISC Core, TANAKA KIYOFUMI; MATSUMOTO TAKASHI; HIRAKI KEI, We have extended general purpose RISC architecture and developed a new RISC core: Casablanca for realtime processing. The core has current RISC architecture and additional register sets used for trap/interrupt processing, and it achieves fast trap execution by changing the register sets and reducing overheads to save/restore register values. Moreover, extended instructions (inter-register-sets instructions, cache line forced instructions, byte twisting instructions) support convenient programming., 26 Nov. 1999, 1999, 100, 51, 56
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Board - Level Simulation of Architecture Based with High - Speed Serial Links, NIINO RYUTA; MATSUMOTO TAKASHI; HIRAKI KEI, We examine hardware simulation environment for the architecture based on linkage with high-speed serial links. Reconfigurable devices emulate major blocks such as memory controllers or link controllers. We reconstruct internal circuits of the devices according to specifications of the simulation targets. They simulate the target blocks by scaling down all the specification at a constant rate. We simulate architecture with OCHA-7, a parallel computer prototype based on the above-mentioned architecture. The target is parallel architecture that high-speed serial line linked between memory chips and processor chips. We re-compose its topology according to required bandwidth and memories. We modify the internal parameters and circuits on boards according to specification of the target, and evaluate the serial link blocks., 26 Nov. 1999, 1999, 100, 57, 62
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, A Scheduling Scheme Based on Free Market Mechanism, MATSUMOTO Takashi; HIRAKI Kei, On existing systems, conventional scheduling methods use processors"utilization to keep fairness between users" application tasks (processes) . As the systems have provided non-blocking I/O facilities for user-programs, new-types of applications that eagerly exploit I/O devices or network communications are comingout. For these applications the bottlenecks of systems are not processor resources but I/O or network ones. Therefore, conventional scheduling methods are old-fashioned for these applications. In this paper a brand-new scheduling scheme"FMM scheme (Free Market Mechanism scheme)" is proposed for workstation cluster systems. In the FMM scheme complicated global schedulers are unnecessary and dynamic optimizations are performed by user-programs. The FMM scheme provide the information disclosure mechanism which enable user-tasks to inexpensively access information on loads, configurations and usages of system resources. The FMM also presents fair node-level schedulers which take usages of I/Os or communications into account., 04 Aug. 1999, 99, 251, 63, 70
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Evaluation of Compiler - Assisted DSM Schemes:ADSM and UDSM, NIWA JUNPEI; MATSUMOTO TAKASHI; HIRAKI KEI, We have proposed two compiler-assisted software-cache schemes. One is a page-based system (Asymmetric Distributed Shared Memory: ADSM) which exploits TLB/MMU only in the cases of read-cache-misses. Another is a segment-based system (User-level Distributed Shared Memory: UDSM) which uses only user-level checking codes and consistency management codes for software-cache. Under these schemes, an optimizing compiler directly analyses shared memory source programs, and performs sufficient optimization. It exploits capabilities of the middle-grained or coarse-grained remote-memory-accesses in order to reduce the number and the amount of communications and to alleviate overheads of user-level checking codes. It uses interprocedural points-to analysis and interprocedural redundancy elimination and coalescing optimization. We have implemented the above optimizing compiler for both schemes. We also have implemented runtime systems for user-level cache emulation. Both ADSM runtime system and UDSM runtime system run on the SS20 cluster connected with the Fast Ethernet (100BASE-TX). We have revealed that both schemes achieve high speed-up ratio with the SPLASH-2 benchmark suite., 02 Aug. 1999, 1999, 66, 95, 100
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Execution Time Behaviour of SPLASH - 2 Benchmark on Software Simulator, TAKAGI Masamichi; MATSUMOTO Takashi; HIRAKI Kei, We evaluate the execution time behaviour of SPLASH-2[4] benchmark programs with RT-level simulator (MISC[1]simulator). We focus the behaviour with change of the parameters of the memory system. The simulator simulates shared bus based cenralized shared-address space multiprocessors, and reflects the delay and interaction of the memory system., 02 Aug. 1999, 1999, 67, 31, 36
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Real-time Round Robin - An Efficient Dynamic Scheduling Algorithm, SASAKI Shigero; MATSUMOTO Takashi; HIRAKI Kei, Rate monotonic or earliest deadline first scheduling algorithm is often used as a real-time scheduling algorithm for periodic tasks. However, neither of them is optimized in respect of both scheduling cost and achievable processor utilization factor. Real-time Round Robin, a new real-time scheduling algorithm, is proposed in this paper. It can guarantee every task is finished before its deadline at low cost even when the task set require most of processor time because it schedule a part of a task set dynamically and tasks to be scheduled, which are determined statically, vary at time. Moreover, the way to reduce response time of aperiodic tasks is described., 30 Mar. 1999, 98, 687, 95, 102
  • Computer Clusters Based on Distributed Shared Memory, 15 Nov. 1998, 39, 11
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Implementing Parallel Speculative Execution of Loops on JVM, YOSHIZOE KAZUKI; MATSUMOTO TAKASHI; HIRAKI KEI, There have been several proposals about hardware speculative executions, in a larger gran-ularity than instruction level parallelism, by partitioning the target program into blocks.We have applied speculative execution onto Java Virtual Machine. We implemented it on a shared memory machine. The target for speculative execution is limited to loops. We measured speedups for simple loops and found that it is possible to gain speedups for loops which contains more than 10000 instructions by an interpreter Java Virtual Machine., 06 Aug. 1998, 72, 1, 6
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Performance Evaluation of MPI/MBCF with Parallel Applications, MORIMOTO Kenji; MATSUMOTO Takashi; HIRAKI Kei, We evaluated the performance of the MPI/MBCF by executing the NAS Parallel Benchmarks.The MPI/MBCF is an MPI library implemented with the Memory-Based Communication Facilities(MBCF)on the SSS-CORE, a general-purpose massively-parallel operating system. To implement the MPI/MBCF, Memory-Based FIFO of the MBCF is used for message buffering provided by the MPI library, and Remote Write for communication without message buffering. This paper shows performance evaluation of the MPI/MBCF on a cluster of workstations with parallel applications, and verifies whether it is effective to construct a message passing library with the MBCF which are based on the shared memory model., 06 Aug. 1998, 1998, 72, 103, 108
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), RPC implemented on Memory Based Control Facility, KAMESAWA HIROYUKI; MATSUMOTO TAKASHI; HIRAKI KEI, We implemented Remote Procedure Call(RPC)library based on Memory Based Control Facility(MBCF).MBCF is a principal comunication system of our massively parallel operationg system(OS)SSS-CORE[6]Matsumoto, '94. Today, implementaion of RPC is not a hot topic.But MBCF has strong features(1)MBCF enables one process to write data directly into another process's memory space.(2)guarantees transaction of data segments.(3)is designed to work well with asynchronous data transaction.These features are useful to implement"reduced copy", "work asynchronously with returning results", "exactly once execution"RPC.In this paper, we discuss implimentation technique of Client-Server application using MBCF and implement RPC library on SUNRPC4.0 on SSS-CORE.we compare performance of RPC on SunOS UDP, SSS-CORE UDP, SSS-CORE MBCF., 06 Aug. 1998, 1998, 71, 9, 16
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), A general -purpose scalable OS SSS-CORE- Structure of the kernel, MATSUMOTO TAKASHI; UZUHARA SHIGERU; TAKEOKA SHOZO; HIRAKI KEI, SSS-CORE is a general purpose scalable operating system for NUMA parallel distributed systems. It provides very efficient multi-tasking environment with timesharing and space partitioning. Furthermore, it tries to allow each parallel application to achieve maximum performance by cooperation of user and kernel level resource allocation and scheduling and by offering low-latency high-throughput memory based communication facilities. SSS-CORE also provides a low-overhead mechanism that allows information transfer between kernel and user level. In this paper we describe the structure of the SSS-CORE kernel, light-weight system-calls, memory-based FIFOs and memory-based signals. Finally we show the basic performance of the SSS-CORE Ver.1.1 system which has been developed from scratch., 06 Aug. 1998, 1998, 71, 53, 60
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Memory - Based Processor II -A commodity supporting middle - grained memory - based communication-, MATSUMOTO TAKASHI; NOMURA MASAYOSHI; KUNISAWA RYOTA; HIRAKI KEI, We propose a novel network interface architecture: "Memory-Based Processor II(MBP2)"which supports efficient middle-grained memory-based communications, legendary TCP/IP and UDP/IP Although the hardware cost of the MBP2 is a almost as same as that of conventional Network Interface Cards(NICs), in memory-based communications the MBP2 system is much superior to the NIC systems., 05 Aug. 1998, 1998, 70, 103, 108
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), A performance evaluation of Run - time Restructuring architecture testbed "Ocha - Pro", TAMATSUKURI JUNJI; MATSUMOTO TAKASHI; HIRAKI KEI, On-Chip system composed by large scale silicon resouces is a candidate of next genaration high performance architecture. We have already proposed"Run-time restructuring"MIMD architecture which can execute sequential binary programs on parallel and effectively by the specualtive execution and the hardware parallelization. Run-time restructuring executes a loop construct on parallel using available on-chip resources.The lage scale speculative execution realizes non-recompiled, non-translated parallel execution. Therefore Run-time restructureing holds"binary compatibility".We use a new clock-base simulater for run-time restruturing testbed"OCHA-Pro(On-Chip mimd Architecture Processor)". We examined the effects of run-time restruturing parallel execution. Then we measured the ILP & run-time restructuring perforamce according to variable element processor ILP., 05 Aug. 1998, 1998, 70, 127, 132
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Evaluation of memory based communication supported by address translation hardware, KUNISAWA Ryota; MATSUMOTO Takashi; HIRAKI Kei, On multi-user, multi-job parallel environment build upon workstation clusters, fast user level communication and synchronization method is needed. We are developping a high speed, enhanced gigabit switching network system which cooperates our general purpose parallel operating system. Memory based communication is the basic communication method for user level communication. In order to protect and virtualize memory based communication, page management mechanism of the operationg system is utilized. We have implemented TLB for caching the result of address translation in our network card, in consequence the overhead in communication is reduced., 04 Aug. 1998, 98, 233, 61, 66
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Performance Evaluation of Parallel Computer Prototype OCHANOMIZ-5, TANAKA Kiyofumi; MATSUMOTO Takashi; HIRAKI Kei, On a parallel/distributed system, it is necessary to provide efficient shared memory mechanisms for a general and convenient system. In this paper, we describe a lightweight method for constructing an efficient distributed shared memory system supported by hierarchical coherence management and generalized combining. In our method, the amount of memory required for directory is proportional to the logarithm of the number of clusters. This implies that only one word for each memory block is sufficient for covering a massively parallel system, and access costs of the directory are small. We have developed a prototype parallel computer, OCHANOMIZ-5, that implements this lightweight distributed shared memory and generalized combining with simple hardware. The results of evaluating the prototype's performance using several programs show that our methodology provides the advantages of parallelization., 04 Aug. 1998, 98, 233, 31, 38
  • A High-Speed User-Level Communication Mechanism for the General-Purpose Massively-Parallel OS : SSS-CORE., MATSUMOTO Takashi; HIRAKI Kei; Takashi Matsumoto; Kei Hiraki; Department of Information Science Graduate School of Science University of Tokyo; Department of Information Science Graduate School of Science University of Tokyo, 15 May 1998, 15, 3, 247, 251
  • Compiling Techniques for ADSM on General-Purpose Massively-Parallel Operating System : SSS-CORE, NIWA Junpei; INAGAKI Tatsushi; MATSUMOTO Takashi; HIRAKI Kei; Junpei Niwa; Tatsushi Inagaki; Takashi Matsumoto; Kei Hiraki; Department of Information Science Graduate School of Science University of Tokyo; Department of Information Science Graduate School of Science University of Tokyo; Department of Information Science Graduate School of Science University of Tokyo; Department of Information Science Graduate School of Science University of Tokyo, 15 May 1998, 15, 3, 242, 246
  • Hardware Distributed Shared Memory of OCHANOMIZ-5, 17 Mar. 1998, 56, 155, 156
  • Network Interface card with address translation buffer : evaluation of memory based communication, 17 Mar. 1998, 56, 117, 118
  • Loop parallerizing mechanism on On-Chip MIMD Hardware, 17 Mar. 1998, 56, 167, 168
  • General-Purpose Massively-Parallel Operating System:SSS-CORE : Compiler optimization for communication overhead reduction, 17 Mar. 1998, 56, 15, 16
  • 情報処理学会研究報告, Implementing Parallel Speculative Execution of Loops on JVM., 美添一樹; 松本尚; 平木敬, 1998, 98, 72(HPC-72)
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Low Cost Hardware Distributed Shared Memory, TANAKA KIYOFUMI; MATSUMOTO TAKASHI; TSUIKI JUN; HIRAKI KEI, Distributed shared memory system is essential to provide a general and convenient programming environment. On a distributed system, caching of remote data is effective for minimization of the access latency. Hardware implementation of protocol management mechanism reduces runtime overheads which accompany with consistency management of cache coherence. This paper describes a low cost method to implement a distributed shared memory system with efficient coherence management mechanism and scalable directory structure, by equipping a memory controller with a simple circuit instead of employing extra SRAM chips for tag information. We implemented this mechanism on a parallel computer prototype system, OCHANOMIZ-5., 28 Oct. 1997, 1997, 102, 79, 84
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Shared Memory vs. Message Passing, MATSUMOTO TAKASHI; HIRAKI KEI, In this paper, "Distributed Shared Memory (DSM)" and "Message Passing Interface (MPI)" are compared and evaluated. As communication and/or synchronization models in programming languages, the selection of the two is only a matter of taste for programmers or language designers. However, from the viewpoints of overhead of execution, freedom of usage, affinity of optimization and cost of implementation, we can discuss which is the better to be equipped in the system (hardware and operating system). We define the DSM as the ability of remote memory access and classify the DSM to two categories. One is called "Fine grain DSM (F-DSM)" and another is "Coarse grain DSM (C-DSM)". In the F-DSM remote-memory-accesses are extensions of usual memory operations of processors. In the C-DSM request packets for remote-memory-accesses are made and transmitted in user-level programs, the target systems process them without user-level programs' assists. We leads first conclusion that the C-DSM is much better than the F-DSM owing to the affinity of compiler optimizations. Finally, we conclude that the C-DSM is superior to the MPI as the system-equipped function., 28 Oct. 1997, 1997, 102, 85, 90
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Performance Evaluation of Compiling Techniques on Asymmetric Distributed Shared Memory, NIWA JUNPEI; INAGAKI TATSUSHI; MATSUMOTO TAKASHI; HIRAKI KEI, We have proposed an "Asymmetric Distributed Shared Memory: ADSM", that realizes user-level protected high-speed communications/synchronizations. In the ADSM, the shared-read is based on a cache-based shared virtual memory system. As for the shared-write, instructions for consistency management are inserted after the corresponding store instruction. Therefore, various optimizations can be performed. We propose an optimizing method of reducing overheads for consistency management. The algorithm coalesces a sequence of consistency management instructions statically/dynamically. We have implemented the prototype of the compiler and the runtime system for the ADSM on a multicomputer Fujitsu AP1000+ and the general-purpose massively-parallel operating system: SSS-CORE. The performance evaluation using LU-Contig of SPLASH-2 shows that the execution time is reduced by 80% using static optimization and it is further reduced by 30% using dynamic optimization., 28 Oct. 1997, 1997, 102, 91, 96
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Performance evaluation of a run - time restructuring mechanism on On - Chip MIMD, TAMATSUKURI JUNJI; MATSUMOTO TAKASHI; HIRAKI KEI, At present, speed-up of microprocessors based on superscalar architecture hit the ceiling. We have proposed run-time restructuring architecture to utilize large hardware resources which is available by an increasing integrity of current VLSI technology. Our system speculatively exploits dynamic parallelism among loop blocks, which is a larger granularity than that of current instruction-level speculation. Loop level parallelism requires more resources than instruction level parallelism, we can also obtain higher performance. On our run-time restructuring mechanism, on-chip MIMD microprocessors dynamically analyze sequential binary executable and restructure it to execute speculatively each loop body. In this paper, we evaluate performance improvement of our run-time restructuring on on-chip MIMD microprocessors, using SPEC95 benchmark suite and graphics application kernel which consist of gif, jpeg, and mpeg expansion routines., 28 Oct. 1997, 1997, 102, 73, 78
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), OCHA - 7 : Parallel Computer Based on Memory String Architecture, NIINO RYUTA; MATSUMOTO TAKASHI; HIRAKI KEI, Memory String Architecture is a parallel system which connects memory chips and processor chips with fast serial links. We design and implement OCHA-7, which bases on this system. These memory chips and processor chips are implemented at board-level on OCHA-7. In this paper, we compare the original model of Memory String Architecture with its simulation on board level. Then we explain the structure of OCHA-7. Finally we describe about the structure of Memory String Architecture being implemented on OCHA-7., 20 Aug. 1997, 1997, 76, 151, 156
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Large - scale speculative parallel execution mechanism on On - Chip MIMD, TAMATSUKURI JUNJI; MATSUMOTO TAKASHI; HIRAKI KEI, For exploiting large hardware resources given by the increased of integrated transistors on one VLSI chip, we attract much larger gulanuarity parallelism of the loop-level speculative execution than the one of current instruction-level parallelism (ILP). The loop-level parallelism needs more resources than ILP but is able to accomplish the higher performance. We have already proposed a parallel microprocessor architecture based on On-Chip MIMD. The architecture can execute a current binary program compiled for a single sequential microprocessor and analyze the program on run-time and restruct it for parallel execution. The restructured program can be executed by a duplicate speculative execution. By these parallel execution, we have showed the ability of the binary compatible parallel microprocessor. We'll show a comparison of differences among our way of loop-level speculative execution which element processors execute restructed programs and other way such as it forks speculative thread continually like a pipeline or it products a control thread for speculation in this paper. And we show a resoluble way of control structure contained in the most-inner loop., 20 Aug. 1997, 1997, 76, 139, 144
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Performance Evaluation of Gigabit Channel for High Functional Network System, KUNISAWA RYOTA; MATSUMOTO TAKASHI; HIRAKI KEI, On multi-user, multi-job parellel environment build upon workstation clusters, fast user level communication and synchronization method is needed. We are developing a high speed, enhanced gigabit switching network system which cooperates our general purpose massively parallel operating system. Memory based communication is the basic communication method for user level communication. We have implemented memory based communication on existing operating system for testing our network interface card, and evaluate the performance of it. We also describe the mechanism required by operating system for realizing fast memory based communication., 19 Aug. 1997, 1997, 75, 67, 72
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Load balancing with SSS-Server, SASAKI Shigero; KAMESAWA Hiroyuki; MATSUMOTO Takashi; HIRAKI Kei, In distributed processing, it is important that all resources on network is made the most of. We need to grasp a state of resources on network for executing applications effectively. So the key for effective parallel distributed execution is exact and inexpensive practically information of a network. SSS-Server offers information on the network system, and supports getting load information. In this study, SDA which distributes jobs with dependency proposed, and the effect of SSS-Server was evaluated., 19 Aug. 1997, 97, 225, 47, 54
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Performance of Memory-Based Communication Facilities Using Fast Ethernet (100BaseTX), MATSUMOTO Takashi; HIRAKI Kei, In general-purpose parallel and distributed systems, performance of the protected and virtualized user-level communications/synchronizations is the most crucial issue to realize efficient execution environments. We proposed a novel high-speed use-level communications/synchronizations scheme "Memory-Based Communication Facilities (MBCF)" suitable for the general-purpose system with off-the-shelf communication-hardware. This paper describes packet formats of the MBCF with 100baseTX communication interfaces. Next, the paper shows basic performance of the MBCF/100baseTX using test programs and a logic analyzer which measures wave forms of 100baseTX interface. Finally we develop another MBCF interface on UDP/IP in conventional operating systems and compare the performance of our original MBCF With that of the MBCF/UDP., 19 Aug. 1997, 97, 225, 109, 116
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Protection and Virtualization of Resources in a General-Purpose Parallel Operating System, MATSUMOTO TAKASHI; HIRAKI KEI, A general-purpose operating system for parallel systems must satisfy two capabilities that contradict each other: realizing protected and time-shared execution environment, and providing efficient parallel-execution environment. In parallel executions with a general-purpose operating system, performance of the protected and virtualized user-level communications/synchronizations is the most crucial issue. We proposed a novel high-speed user-level communications/synchronizations scheme "Memory-Based Communication Facilities (MBCF)" suitable for the general-purpose parallel operating system with off-the-shelf communication-hardware. For achieving high-performance, MBCF adopts the direct remote-accesses to destination user-level memory-space without address checks. In this paper, we discuss aspects of protection and security on MBCF. We conclude MBCF is qualified for not only parallel processing but also server-client distributed-computations which require strict protection and security., 05 Jun. 1997, 97, 86, 37, 42
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Memory String Architecture -Beyond the Memory Wall-, MATSUMOTO TAKASHI; HIRAKI KEI, We adopt high-speed serial links for processor-memory connections in order to solve the memory wall problem. Therefor we propose two novel devices that have high-speed serial interfaces: the Multi-ported Serial-Access Memory (MSAM), which is an extended DRAM chip, and the Multi-ported Serial-access Processor (MSP) that includes MSAM using com-bined DRAM and logic technology. Finally, the Memory String Architecture that consists of MSP chips and MSAM chips is introduced and discussed., 31 Oct. 1996, 1996, 106, 1, 6
  • An Evaluation of Scheduling Methods for General Purpose Operating System SSS-CORE, 04 Sep. 1996, 53, 95, 96
  • An Architecture for speculative loop execution with Bianry compatibility, 04 Sep. 1996, 53, 117, 118
  • Memory-Based Communication Facilities of the General-Purpose Massively-Parallel Operating System : SSS-CORE., 04 Sep. 1996, 53, 37, 38
  • Parallelization of Traversal Loop for Dynamically Allocated Objects, 04 Sep. 1996, 53, 335, 336
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Compiler Oriented Implementation of Shared Object Space on Distributed Memory, NIWA JUNPEI; INAGAKI TATSUSHI; MATSUMOTO TAKASHI; HIRAKI KEI, On a distributed memory parallel machine, it needs much effort to write applications which deal with dynamic and complex data structures by using message passing library. To reduce the difficulty, it is necessary for a language or a runtime-system to provide shared name space. In this paper, we describe how to provide the software shared name space based on objects. Existing systems entrust users the description of low level communication. We propose that the compiler analyzes the code and supports the description of low level communication. Furthermore we propose that the compiler generates many descriptions of the communication, and the compiler uses the suitable one as the case may be, which results in speedup. We develop the prototype system runnning on AP1000+, and evaluate our approach, which exhibits good speedup., 28 Aug. 1996, 1996, 81, 7, 12
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Server for gathering remote resource's status SSS-Server, KAMESAWA HIROYUKI; MATSUMOTO TAKASHI; HIRAKI KEI, SSS-Server is shown ,in this paper which is based on information delivering system of SSS-CORE [1][Matsumoto, 94], generally purposed parerell operationg system, and implemented on generary used operating system. Information delivering system of SSS-CORE has characteristics as (1) eagar information gathering (2) efficient information delivering with stratified structture of network system. This paper shows algorythms and implementation of infromation delivering server named SSS-Server, which shares those characteristics. As is implemented on generary used operaring system, SSS-Server enables hosts with various operating systems to share infromation, and supports development of applications with remote resources. And "rwho" is shown as a sample., 28 Aug. 1996, 96, 82, 103, 108
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Todai Protocol : A High - Speed Snoop Protocol Suitable for High - Functional Distributed Shared Memory Systems, MATSUMOTO TAKASHI; HIRAKI KEI, We propose a novel snoop protocol: "Todai Protocol" which is implemented with only single-ported memory chips. The Todai Protocol is suitable to high-speed pipelined split-phase buses. We describe high-speed implementation techniques for the Todai protocol and also mention its extension methods to the protocol in a cluster of distributed shared memory systems., 27 Aug. 1996, 1996, 80, 227, 232
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Gigabit Network with Cooperative Functions for General Purpose Massively Parallel OS, KUNISAWA Ryota; MATSUMOTO Takashi; HIRAKI Kei, We are developping a high speed, enhanced gigabit switching network system which cooperates our general purpose massively parallel operating system. Communication overhead among user programs under multiuser/multijob environment is reduced by supporting hardware, so we can have efficient parallel execution environment on network of workstaions. We have implemented network interface hardware for Sun workstation, and describe the key functions of it in this paper., 27 Aug. 1996, 1996, 80, 83, 88
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), An Architecture for Speculative Parallel Execution with Loop Analyzer, TAMATSUKURI JUNJI; MATSUMOTO TAKASHI; HIRAKI KEI, We propose a new architecture which dynamically analyzes a binary code of a preparallelized sequential loop and executes in parallel. To execute dynamicaly a sequential loop with depedencies among iterations in parallel, the architecture should have a dependency resolving mechanism in processor. In this architecture, the possibility of speculative memory access can be detected by analyzing register production dependencies. And on control dependencies on branch instructions, parallel execution is realized by loop-lebel speculative execution. In this paper, we propose the dynamic dependency analyzing mechanism needed by dynamic loop level parallel execution and the multiple speculative execution mechanism which realize some supeculative execution on memory accesses and control instructions without increasing processor resources which loop analyzer release by Elastic Barrier. We describe about our pilot model OCHA-Pro(On-CHip MIMD Architecture Processor) appending these mechanism and show an execution performance by simulation., 27 Aug. 1996, 1996, 80, 61, 66
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Scheduling in General Purpose OS SSS - CORE -An Evaluation by Detailed Probabilistic Simulation-, NOBUKUNI YOJIRO; MATSUMOTO TAKASHI; HIRAKI KEI, Preventing parallel processes from unexpected ineficiencies is a major concern for constructing multiple user/multiple job environment in distributed memory systems. Systems can achieve high performance by using shcheduling policies which reflects resource comsumption states. For a general environment, which must support concurrent execution of multiple processes, a way is needed to keep systems' effectiveness when phisical memories are full. In distributed systems, memory pages can be classified by access frequencies and required costs for accesses after target pages has been replaced. Selecting victim pages according to the classification may enhance system performance. We built a probabilistic model with a concrete memory management scheme and differntiated memory access costs, and incorporated memory reference frequencies to it. The paper describes an evaluation of scheduling policies using resource informations for each process and of page replacement policies under the model., 26 Aug. 1996, 1996, 79, 79, 84
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), A General -Purpose Massively- Parallel Operating System : SSS - CORE -Implementation Methods for Network of Workstations-, MATSUMOTO TAKASHI; KOMAARASHI TAKETO; UZUHARA SHIGERU; TATEOKA SHOZO; HIRAKI KEI, We propose a novel memory sharing scheme "Asymmetric Distributed Shared Memory: ADSM", that realize user-level protected high-speed high-functional communications/synchronizations. The ADSM scheme is developed for the general-purpose massively-parallel operating system: SSS-CORE on Network of Workstations (NOW) or distibuted-memory multiprocessors. We describe two programming models and two execution models suitable for ADSM on the SSS-CORE. We also introduce Memory-Based Communication Facilities of the SSS-CORE which enable a high-speed imprementation of ADSM. Next, we describe the implementation status of the current SSS-CORE on NOW. Finally, we show actual measurements of exection times and communication overheads in fine-grained applications on the SSS-CORE., 26 Aug. 1996, 1996, 79, 115, 120
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Implementation of Distributed Shared Memory with Low Cost Hardware : OCHANOMIZ 5, TSUIKI Jun; TANAKA Kiyofumi; MATSUMOTO Takashi; HIRAKI Kei, Large scale distributed shared memory systems require mechanisms for lowering overheads of keeping cache-coherence. When numbers of clusters sharing same variable become large, a great amount of memory is needed for recording sharing clusterIDs (directory). Efficient coherence control mechanisms and a scalable directory are essential for realizing large scale distributed shared memory systems. We propose a distributed shared memory system with Efficient coherence control mechanisms and a scalable directory. And we evaluated the system using parameters of a prototype parallel processing system on which we are implementing the proposed system., 26 Aug. 1996, 96, 230, 55, 62
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Preliminary Performance Evaluation of General Purpose Parallel Computer Prototype OCHANOMIZ-5, TANAKA Kiyofumi; TSUIKI Jun; MATSUMOTO Takashi; HIRAKI Kei, For the purpose of developing/supporting architecture for a general purpose massively parallel computer of the next generation, it is an effective way to really make a prototype machine and verify its mechanism. From this point, we have made out a parallel computer prototype, OCHANOMIZ-5. This machine now consists 8 processors and runs with distributed memories. In the paper, we describe the implementation framework of OCHANOMIZ-5 and preliminarily evaluate the performance of it by real applications., 26 Aug. 1996, 96, 230, 47, 54
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Code Generation for Fine-Grained Parallel Processing utilizing Memory Based Synchronization Bits, Inagaki Tatsushi; Matsumoto Takashi; Hiraki Kei, We describe a code generation method for efficient fine-grained parallel processing at iteration level and instruction level to utilize memory based synchronization bits. Our target system provides the MISC(A Mechanism for Intergrated Synchronization and Communication) mechanism to realize atomic synchronization and communication at a memory access level. To speed-up DOACROSS loops, it is important to exploit fine-grained parallelisms within an iteration and among iterations. We use two different level of parallel processing according to parallelisms lying in a task graph of a DOACROSS loop., 24 Aug. 1995, 95, 82, 49, 56
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), A Vector - Loading Supporting Mechanism for Usual Cache - based Processors, OOTSU Kanemitsu; MATSUMOTO Takashi; HIRAKI Kei, Large data sets in practical application programs degrade the performance of current cache-based computer systems, since the cache memories cannot hold the whole data. Toward this problem, it is quite effective to fetch necessary data in advance and to rearrange them for the cache memories easy to treat using the regularity of them. Global Structure Pre-Fetching mechanism is the one that prefetches target data into local buffer near the processor and returns stored data in the local buffer to the processor immediately after the processor requests them. In this paper, the GSPF mechanism, which has been implemented on the parallel processor OCHANOMIZ-1, is explained and the performance evaluations by simulation are shown., 23 Aug. 1995, 1995, 80, 177, 184
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Scalable Parallel Processing System Prototype : OCHANOMIZ 5, TSUIKI Jun; TANAKA Kiyofumi; MATSUMOTO Takashi; HIRAKI Kei, Mechanisms for supporting efficient use under general environment and Scalability are necessary for future general purpose parallel processing systems, so that they can be exchangeable for existing sequential processing systems. We designed OCHANOMIZ 5 : scalable parallel processing system with hardware-supported synchronization mechanisms, as a flexible and powerful prototype of future general purpose parallel processing systems. In this paper, we describe distributed shared memory, processor-based synchronization mechanisms and memory-based synchronization mechanisms being implemented on OCHANOMIZ 5., 23 Aug. 1995, 1995, 80, 25, 32
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), A Implementation of Disk I/O Subsystem for Massively Parallel Computer JUMP - 1, NAKANO Tomoyuki; NAKAJO Hironori; OKADA Tsutomu; MATSUMOTO Takashi; KOHATA Masaki; MATSUDA Hideo; HIRAKI Kei; KANEDA Yukio, A massively parallel computer JUMP-1 is a distributed shared-memory machine which consists of multiple clusters which include processors for inter-processor communication and synchronization called MBP, via a inter-connection network called RDT. I/O units are connected to clusters via fast serial links called Serial, Transparent Asynchronous First-in First-out Link (STAFF-Link), and each I/O buffer is mapped to global memory space. Thus, I/O access can deal with memory read/write access. In this paper, we describe the implementation of a disk I/O unit and evaluation of its performance., 23 Aug. 1995, 1995, 80, 137, 144
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Evaluation Of A Scheduling Method Using Resource Informations For General Purpose Parallel OS, NOBUKUNI Yojiro; MATSUMOTO Takashi; HIRAKI Kei, Parallel processing on parallel/distributed systems is showing greater availability as network-surrounding micro-electronics evolves and many optimization mechanisms are realized. This paper describes a kernel-level scheduling method to build a general purpose parallel OS on NUMA-type parallel machines. In a distributed memory environment, constructing a multi-user/multi-job world without decreasing high efficiency of parallel applicatoins can be achieved by managing resource informations and scheduling according to them. We simulated on a simplified model and evaluated four scheduling methods., 23 Aug. 1995, 95, 210, 111, 118
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Reconfigurable High Functional Network of General Purpose Parallel Computer Prototype OCHANOMIZ-5, TANAKA Kiyofumi; TSUIKI Jun; MATSUMOTO Takashi; HIRAKI Kei, As for the construction of a large-scale parallel computer system, we examine a method of providing a interconnection network with high functions, and give an outline of the prototype. The main objectives of this study is to make functions of improving total efficiency of the system by using a hierarchical network and letting each internal node of it have high-level funcitons by reconfiguration. OCHANOMIZ-5, which we are developing, is a parallel computer system with distributed memories. It consists of clusters in which two processors share a bus, and a hierarchical tree network combining those clusters. Each node of the network is constructed with a reconfigurable FPGA, and can deal with various works by changing itself depending on a application. On this implementation, we build a high functional network which has an ability of an efficient hierarchical broadcasting, synchronization, and various sorts of computation., 22 Aug. 1995, 95, 209, 49, 56
  • SSS-CORE : Operating System Kenel for General Purpose Massively Parallel Machine, 20 Sep. 1994, 49, 61, 62
  • OCHANOMIZ-1: Performance Evaluation of Global Synchronization Mechanism, 20 Sep. 1994, 49, 27, 28
  • Instruction-set design of an MBP-core processor, 20 Sep. 1994, 49, 1, 2
  • Firmware Design on MBP-core : Support of Pseudo Fullmap Directory, 20 Sep. 1994, 49, 3, 4
  • Performance Evaluation of Global Structure Pre-Fetch Mechanism, 20 Sep. 1994, 49, 23, 24
  • A Compilation Technique using Memory Based Synchronization Bits, 20 Sep. 1994, 49, 29, 30
  • Evaluation of Pseudo Full Map System on RDT Network, 20 Sep. 1994, 49, 67, 68
  • Performance Evaluation of Memory System of a General Purpose Fine-Grained Parallel Processor OCHANOMIZ-1, 20 Sep. 1994, 49, 25, 26
  • Elastic Memory Consistency Models, 20 Sep. 1994, 49, 5, 6
  • Implementation of Pseudo Fullmap Directory Cache, 21 Jul. 1994, 1994, 66, 217, 224
  • An I/O Access Method for the Massively Parallel Computer JUMP - 1, 21 Jul. 1994, 1994, 66, 177, 184
  • Quantitative Evaluation of Pseudo - Fullmap Directory using Simulation, 21 Jul. 1994, 1994, 66, 201, 208
  • Evaluation of Pseudo Full Map System in Directory Catche, 07 Mar. 1994, 48, 33, 34
  • I/O Subsystem for the Massively Parallel Computer JUMP - 1, 27 Jan. 1994, 1994, 13, 105, 112
  • Performance Evaluation by Utility Applications of Mechanisms to Optimize Parallel Processing, 27 Jan. 1994, 1994, 13, 41, 48
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, An Optimizing Compiler for Hierarchical Fine-Orain Parallelism, Inagaki Tatsushi; Matsumoto Takashi; Hiraki Kei, Fine-grain parallel processing can remove bottleneck of coarse grain parallel processing.On fine-grain parallel processing,we must analyze quantitatively parallelism in a given program, parallelism in a target machine,and costs of computation, communication,and synchronization in the machine.We developed an optimizing compiler OP.1(Optimizing Parallelizer),that uses static task scheduling.OP.1 utilizes intra- and interprocessor fine-grain parallelism.It duplicates predecessor tasks using DSH(Duplication Scheduling Heuristics),and generates object codes for fine-grain support mechanisms.This paper describes scheduling method and code generating method for synchronization,and evaluates generated codes using various benchmark programs., 1994, 94, 105, 112
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Performance Evaluation of Interconnection Networks Using Optinized Parallel Application Codes, Takemoto Michiharu; Matsumoto Takashi; Hiraki Kei, Parallelization is a key to the recent demand for high speed computing.Interconnection networks are important for the parallel computing systems,since the communication and the synchronization among the processing elements should be efficiently implemented. Although they have been studied from the algorithmic angles,they have not been evaluated using the application codes with latency hiding techniques.The interconnection networks with the application codes,the SOR,the matrix vector multiplication,and the FFT computations are evaluated.Wed describe that the deterioration of the performance,which is caused by the disagreement between the network topology and the communication pattern,can he compensated by the optimization of the application programs., 16 Nov. 1993, 93, 320, 65, 72
  • Network Topology Simulator for Massively Parallel Computers, 27 Sep. 1993, 47, 179, 180
  • Extended Snoopy Spin Wait and Herarchical Elastic Barrier, 27 Sep. 1993, 47, 43, 44
  • Performance Evaluation by Parallel Utility Applications on a Execution-Driven Simulator, 27 Sep. 1993, 47, 47, 48
  • An Optimizing Compiler for a General Purpose Fine-Grained Parallel Processor OCHANOMIZ-1, 27 Sep. 1993, 47, 59, 60
  • Basic concept of a general purpose fine-grained parallel processor Ochanomiz-1, 27 Sep. 1993, 47, 55, 56
  • Global Synchronization Mechanism of Fine-Grained Parallel Processor OCHANOMIZ-1, 27 Sep. 1993, 47, 61, 62
  • Global Structure Pre-Fetch Mechanism of a General Purpose Fine-Grained Parallel Processor OCHANOMIZ-1, 27 Sep. 1993, 47, 57, 58
  • Memory-Based Data-Driven Synchronization Mechanism of a General Purpose Fine-Grained Parallel Processor Ochanomiz-1, 27 Sep. 1993, 47, 63, 64
  • Memory Access Localization for Shared Memory Multiprocessors, 27 Sep. 1993, 47, 161, 162
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Cache Injection and High - Performance Memory - Based Synchronization Mechanisms., MATSUMOTO T.; Hiraki Kei, In this paper, we propose the concept of Cache Injection. Cache injection is an action of assigning data into processors' cache by an external element. To define generally, the initiator of data transmission can arbitrarily specify multiple caches as targets of the cache injection. Cache injection technique is useful for implementing various basic mechanisms used in parallel processing systems such as a light message-passing, a latency hiding/reduction by decoupled-architecture approach, an efficient macro-dataflow execution using conventional microprocessors. Then, we describe the merits of Memory-Based Synchronization mechanisms and the strategies for their performance improvements. Implementation methods of the proposed mechanisms on the D-machine (tentative name) of the Japan University Massively Parallel Processing project are described. The performance of memory-based synchronization mechanisms can be improved by the caching technique with some special treatments, and the methods are presented. Finally, application examples of cache injection and memory-based synchronization are discussed., 19 Aug. 1993, 1993, 71, 113, 120
  • IPSJ SIG Notes, Information Processing Society of Japan (IPSJ), Architecture and evaluation of OCHANOMIZ - 1, Nakazato Gaku; Ootsu Kanemitsu; Totsuka Yonetaro; Matsumoto Takashi; Hiraki Kei, We report the general-purpose fine-grain multiprocessor OCHANOMIZ-1. Conventional bus connected multiprocessors have no facilities to support light weight synchronization nor communication. Therefore these machines cannot handle efficiently instruction level parallelism. OCHANOMIZ-1 has commercial high performance microprocessors as its processor elements and FPGA's to support fine grain parallel processing. These FPGA's enable us to implement various type of mechanisms fitting to applications. Our current implementation of fine grain supporting mechanisms are 1) Elastic Barrier, which provides processors' synchronization with little overhead. 2) Data Driven Synchronization, which unifies producer-consumer type data transfer and synchronization. 3) Global Structure Pre-Fetching Mechanism, which efficiently transfers large array data (possibly with non-unit stride) into continuous cache lines. In this paper, the overall structure of OCHANOMIZ-1 and the implementation of these mechanisms are described. We also discuss the impact of these mechanisms on fine grain parallel processing., 19 Aug. 1993, 1993, 71, 57, 64
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Compsite Parallel Processing Architecture with Two different processing element with different grain size, Hiraki Kei; Matsumoto Takashi, In this paper,a basic architecture for efficient massively- parallel processing is discussed.In order to construct general purpose massively parallel processing systems,efficient and close interaction between processing elementsis a central issue.We propose a composite architecture with two different processing elements which are optimized to different grain sizes(fine-grain and coarse-grain).The proposed architecture can exploit high performance of coarse-grained RISC processor performance in connection with flexiblefine-grained operation such as virtually shared memory,versatile synchronization and message communications. After detailed discussion,we describe architecture of the prototype machine(D-machine)., 18 Aug. 1993, 93, 180, 1, 8
  • IEICE technical report. Computer systems, The Institute of Electronics, Information and Communication Engineers, Evaluation of Network Topology on Application Codes Using Latency Hiding Techniques, Takemoto Michiharu; Matsumoto Takashi; Hiraki Kei, The current key for successful high-performance computing requires creating massively parallel computers.When we develop a massively parallel computer,we must consider interconnection networks.It is because the communication and the synchronization among the processing elements are important.Tey have been well studied theoretically without actual costs.They do not apply the evaluation the latency hiding techniques to the evaluations.The technique of overlapping communication and calculation is influential for gaining high performance.The constraint and the demand for interconnection networks may be changed,if we evaluate them with the technique.We construct a simulator with the facilities to change and evaluate the networks.We run the application codes using optimization techniques for static latency hiding and evaluate the interconnection network topology., 18 Aug. 1993, 93, 180, 113, 120
  • IPSJ Journal, Information Processing Society of Japan (IPSJ), Evaluation of PHIGS Geometry Processing on Multiprocessor Systems, MATSUMOTO Takashi; KAWASE Kei; MORIYAMA Takao, In present high-end graphics systems, geometric calculations for 3D graphic images are bottlenecks in system parformance. To cope with this problem, parallel processing tech-niques are employed. However, in PHIGS, there are some obstacles to efficient parallel processing. We therefore proposed two mechanisms that enable shared-memory/shared-bus multiprocessors to perform efficient geometric calculations in PHIGS. To evaluate the quantitative effects of the mechanisms and estimate the influence of the overhead caused by job-dispatching and bus-contentions, we hava parformed various simulations on an ex-ecution-driven multiprocessor simulator. In a master-slave dispatching model, as the number of slave processors increases, task dispatches of the master processor becomes the bottle-neck. We therefore devised a simple architectural support for task dispatch, and confirmed that it significantly improves the perfarmance. In a no-master dispatching model where all processors work symmetrically, the increase in the bus traffic is the problem. We confirm that the all-read protocol reduces the bus traffic and improves the performance., 15 Apr. 1993, 34, 4, 732, 742
  • Information Processing Society of Japan (IPSJ), OP.1:An Optimizing Parallelizer for Fine-Grain Multiprocessors, Inagaki Tatsushi; Matsumoto Takashi; Hiraki Kei, On fine-grain parallel processing, as processor elements become faster, to consider communication and synchronization overheads becomes more important. This paper describes compilation techniques to reduce communication and synchronization costs on fine-grain parallel processing at the instruction level in basic blocks of procedural languages, and presents implementation results of these techniques on an optimizing parallelizer(OP.1) for fine-grain multiprocessors. OP.1 adopts DSH(Duplication Scheduling Heuristic), a scheduling heuristic which duplicates preceding tasks to optimize communication overheads. OP.1 generates low cost synchronization codes which use Elastic Barrier mechanism., 1993, 13, 1, 7

Books etc

Presentations

  • 15 Jan. 2018, 15 Jan. 2018 - 15 Jan. 2018
  • 12 Sep. 2017, 12 Sep. 2017 - 12 Sep. 2017
  • 25 Aug. 2016, 25 Aug. 2016 - 26 Aug. 2016

Works

  • High-Performance Embedded SoC: JSTEP-3, Takashi Matsumoto
  • NRFS: Network Raid File System, Takashi Matsumoto
  • The Scalable Operating System SSS-PC, Takashi Matsumoto
  • The Scalable Operating SSS-CORE, Takashi Matsumoto

Awards

  • Oct. 1997
  • Mar. 1990

Industrial Property Rights

  • Patent right, Parallel and Distributed Computing System, Takashi MATSUMOTO, ; , 特願2022-540050, 11 Jun. 2021
  • Patent right, Parallel and Distributed Computing System, Takashi MATSUMOTO, ; , 特願2022-530631, 11 Jun. 2021
  • Patent right, Parallel and Distributed Computing System, Takashi MATSUMOTO, ; , 特願2020-114654, 02 Jul. 2020
  • Patent right, LSIチップ及びネットワークシステム, 松本 尚, 奈良女子大学, 特願2015-20892, 05 Feb. 2015, 特開2015-165656, 17 Sep. 2015, 特許第6580333, 06 Sep. 2019, 25 Sep. 2019
  • Patent right, ネットワーク機器、ネットワークシステム、LSIモジュール及び変換モジュール, 松本 尚, 奈良女子大学, 特願2014-81424, 10 Apr. 2014, 特開2015-203885, 16 Nov. 2015
  • Patent right, ネットワークシステム, 松本 尚, 奈良女子大学, 特願2014-49222, 12 Mar. 2014, 特開2015-172906, 01 Oct. 2015
  • Patent right, LSIチップ及びネットワークシステム, 松本 尚, 奈良女子大学, 特願2014-20896, 06 Feb. 2014
  • Patent right, Parallel and Distributed Computing System, Takashi MATSUMOTO, ; , 特願2022-142713
  • Patent right, プロセッサ., 松本 尚, 科学技術振興事業団, 特願平11-354203, 29410735, rm:research_project_id
  • Patent right, アクセス方法及びアクセス処理プログラムを記録した記録媒体, 松本 尚, 科学技術振興事業団, 特願平11-255272, 29410735, rm:research_project_id
  • Patent right, Multiprocessor memory managing system and method for executing sequentially renewed instructions by locking and alternately reading slave memories, Kawase, K; Matsumoto, T; Moriyama, T, IBM Corp., 特願平3-233749
  • Patent right, Multiprocessor system and process synchronization method therefor, Matsumoto, T, IBM Corp., 特願平1-277334
  • Patent right, Image display method and apparatus, Matsumoto, T, IBM Corp., 特願昭63-285698
  • Patent right, Multiprocessor system having synchronization control mechanism, Fukuda, M; Matsumoto, T; Nakada, T, IBM Corp., 特願平1-57762
  • Patent right, Graphics system shadow generation using a depth buffer, Matsumoto, T, IBM Corp., 特願昭63-224448

Research Projects

  • Apr. 2019 - Mar. 2020, Principal investigator, 組込みSoCクラスタ化技術に関する研究, 松本 尚, 電機メーカー, 共同研究, 奈良女子大学
  • Apr. 2018 - Mar. 2019, Principal investigator, 組込みSoCクラスタ化技術に関する研究, 松本 尚, 電機メーカー, 共同研究, 奈良女子大学
  • Apr. 2017 - Mar. 2018, Principal investigator, 組込みSoCクラスタ化技術に関する研究, 松本 尚, 電機メーカー, 共同研究, 奈良女子大学
  • Apr. 2016 - Mar. 2017, Principal investigator, 開発エンジニア早期育成に向けた実証実験, 松本 尚, ソフトウェア開発会社, 共同研究, 奈良女子大学, rm:published_papers
  • Jun. 2015 - Mar. 2016, Principal investigator, 開発エンジニア早期育成に向けた実証実験, 松本 尚, ソフトウェア開発会社, 共同研究, 奈良女子大学, rm:published_papers
  • May 2014 - Mar. 2015, Principal investigator, GPGPUエンジニア早期育成に向けた実証実験, 松本 尚, ソフトウェア開発会社, 共同研究, 奈良女子大学
  • 基盤研究(B), Apr. 2005 - Mar. 2008, Principal investigator, スケーラビリティと耐故障性を持つサーバシステムの構成法に関する研究, 松本 尚; 並木 美太郎; 中條 拓伯; 藤野 貴之; 浅野 正一郎, 日本学術振興会, 平成17年度科学研究費補助金, 国立情報学研究所
  • Oct. 2001 - Sep. 2004, Principal investigator, 高性能組込マイクロプロセッサ, 松本 尚; 田中 清史, 科学技術振興事業団, 新規事業志向型研究開発成果展開事業, rm:works
  • Apr. 2001 - Dec. 2003, Principal investigator, 次世代オペレーティングシステムSSS-PCの開発, 松本 尚, 情報処理振興事業協会, IPA情報技術開発支援事業, rm:works
  • Jul. 2001 - Feb. 2002, Principal investigator, Linux版ネットワークRAIDファイルシステムの実用化, 松本 尚, 情報処理振興事業協会, 未踏ソフトウェア創造事業, rm:works
  • Oct. 1998 - Sep. 2001, Principal investigator, 自律最適化を支援する資源割り当て方式の研究, 松本 尚, 科学技術振興事業団, さきがけ研究21『情報と知』領域, 東京大学, rm:works
  • 奨励研究(A), Apr. 1999 - Mar. 2001, Principal investigator, 共有メモリ並列プログラムの通信最適化に関する研究, 松本 尚, 日本学術振興会, 平成11年度科学研究費補助金, 東京大学
  • Oct. 2000 - Feb. 2001, Principal investigator, ネットワークRAIDファイルシステムの開発, 松本 尚, 情報処理振興事業協会, 未踏ソフトウェア創造事業
  • Apr. 1998 - Feb. 2001, Coinvestigator, スケーラブルな分散サーバ環境の研究., 松本 尚, 情報処理振興事業協会, 独創的情報技術育成事業に係わる開発, rm:works
  • Apr. 1999 - Jan. 2000, Coinvestigator, メモリベース概念に基づく次世代ネットワーク構築方式の研究開発, 松本 尚, 情報処理振興事業協会, 次世代デジタル応用基盤技術開発事業
  • 奨励研究(A), Apr. 1997 - Mar. 1999, Principal investigator, ソフトウェアメモリベース通信機構に関する研究, 松本 尚, 日本学術振興会, 平成9年度科学研究費補助金, 東京大学, rm:works
  • Apr. 1995 - Feb. 1998, Coinvestigator, 超並列オペレーティングシステムカーネルSSS-COREの研究, 松本 尚, 情報処理振興事業協会, 独創的情報技術育成事業に係わる開発, rm:works
  • 奨励研究(A), Apr. 1996 - Mar. 1997, Principal investigator, 一般化されたコンバイニング機構に関する研究, 松本 尚, 日本学術振興会, 平成8年度科学研究費補助金, 東京大学
  • 奨励研究(A), Apr. 1995 - Mar. 1996, Principal investigator, エラスティックメモリコンシステンシモデルに関する研究, 松本 尚, 日本学術振興会, 平成7年度科学研究費補助金, 東京大学
  • 奨励研究(A), Apr. 1994 - Mar. 1995, Principal investigator, キャッシュインジェクション機構の定量評価, 松本 尚, 日本学術振興会, 平成6年度科学研究費補助金, 東京大学
  • Jul. 1994 - Feb. 1995, Coinvestigator, 超並列オペレーティングシステムカーネルSSS-COREの研究., 松本 尚, 情報処理振興事業協会, 独創的情報技術育成事業に係わる開発
  • 奨励研究(A), Apr. 1993 - Mar. 1994, Principal investigator, 密結合マルチプロセッサ上のElastic Barrierの性能評価, 松本 尚, 日本学術振興会, 平成5年度科学研究費補助金, 東京大学
  • Apr. 2021 - Mar. 2022, Principal investigator, 組込みSoCクラスタ化技術に関する研究, 電機メーカー, 共同研究
  • Apr. 2020 - Mar. 2021, Principal investigator, 組込みSoCクラスタ化技術に関する研究, 電機メーカー, 共同研究

■Ⅲ.社会連携活動実績

1.公的団体の委員等(審議会、国家試験委員、他大学評価委員,科研費審査委員等)

  • 日本学術振興会, 審査第二部会情報学小委員会科学研究費委員会専門委員, Jan. 2016 - Dec. 2016, Society
  • 日本学術振興会, 審査第二部会情報学小委員会科学研究費委員会専門委員, Jan. 2015 - Dec. 2015, Society