TACC supports research work on a wide range of machine learning applications with high-performance and scalable infrastructure on both software- and hardware-level. TACC is tailored to the workflow of machine learning application, providing you with a more efficient process of managing, deploying and scaling compute-intensive machine learning jobs in a computing cluster.
Unified Interface
The unified interface processes user job profiles for the machine learning cluster and offers tools for job monitoring, output retrieval, and log streaming. It enables job submission from local environments and incorporates environment provisioning scripts before scheduler submission.
ML Systems
ML frameworks optimize model development with state-of-the-art parallelization and distributed training techniques. These strategies improve data processing and algorithm execution, enabling efficient handling of complex computations and large datasets.
Cluster Scheduler
The cluster scheduler optimizes resource allocation in a machine learning cluster by analyzing job performance factors like completion time, resource utilization, and efficiency. This critical component boosts cluster efficiency and throughput.
AI-centric Networking
Our research enhances AI workloads by efficiently managing the transport of large models and utilizing SmartNICs for compute offloading. This setup optimizes data flow and reduces latency, ensuring high performance and scalability for AI applications.
2
TACC-managed clusters
Active TACC users since 2021/05
24,303
Task process on TACC clusters
235,739
GPU hours used for ML tasks
Call for Pioneers
Early-adopter Application is now open! Join now and boost your AI research
TACC at HKUST
At HKUST, TACC manages clusters of over 160 GPU cards for research and education in machine learning with open access to the research community.
Compared to the beginning of 2023, TACC in 2023 has seen an 84% increase in active users to 397 and a 115% rise in processed ML tasks to 24,303.
TACC supports over 40 research projects and has seen so far 22 citations at top conferences including SIGMOD, KDD, CVPR, and UbiComp.
By researchers, for researchers
Embrace the TACC solution in your cluster, an advanced approach to cluster management and task handling that enhances efficiency and reliability.
Our research-backed solution includes comprehensive hardware monitoring, streamlined maintenance, and efficient job scheduling and execution.
High Stability
TACC enhances operational reliability by providing continuous 24/7 hardware monitoring, robust containerization options, and equitable and efficient job scheduling mechanisms.
Enhanced Usability
Task provisioning and management are effortless with TACC. Users can submit and monitor tasks via command line, web UI, or API, making the system highly accessible.
Maximized Performance
TACC incorporates state-of-the-art ML systems and network technologies from the academic sector, tailored for maximizing performance and scalability in AI applications.