Biography

Weijia Shang received BS degree in computer engineering from Changsha Institute of Technology, China, and Master and Ph.D. degrees in computer engineering from Purdue University, West Lafayette, Indiana. She joined Santa Clara University in January 1994. Before that, she was on faculty of the Center for Advanced Computer Studies, University of SW Louisiana for three and half years. She received Research Initiation Award in 1991 and Career Award in 1995 form National Science Foundation. She was a Clare Boothe Luce Professor between 1994 and 2000. Her research interests include parallel processing, computer architecture, parallelizing compiler, algorithm theory and non-linear optimization.

Education

Computer Engineering, Ph.D. May, 1990 Purdue University, W. Lafayette, IN 47907

Computer Engineering, M.S. May, 1984 Purdue University, W. Lafayette, IN 47907

Computer Engineering, B.S. January, 1982 Changsha Institute of Technology, Changsha, China

Research

- Parallel processing
- Computer architecture
- Parallelizing compiler techniques
- Algorithms

Courses Taught

- Computer Architecture
- Algorithms

Publications

**Journal Publications:**- 1. Y. Chen and W. Shang, "Supernode transformation on GPGPUs," International Journal of Parallel, Emergent and Distributed Systems, March 16, 2017, pp. 1-22.
- 2. H. Cui, X. Su, and W. Shang, "On Optimal Media/Video Distribution in Closed P2P-Based IPTV Networks," Computer Networks, Volume 60, 26, February 2014, Pages 217–232.
- 3. C. Neely, G. Brebner, W. Shang, “ReShape: Towards a High-Level Approach to Design and Operation of Modular Reconfigurable Systems,” ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 6, issue 1, May 2013.
- 4. J. Steinbrecher, C. J. Philippidis and W. Shang "A Case Study of Implementing Supernode Transformations," International Journal of Parallel Programming, May 2013
- 5. C. Philippidis and W. Shang “On Minimizing Register Usage of Linearly Scheduled Algorithms with Uniform Dependencies,” Journal of Computer Languages, Systems and Structures, pp. 250-267, vol. 36, October 2010.
- 6. J. Zhang, X. Yi, N. Ling, and W. Shang, “Context Adaptive Lagrange Multiplier (CALM) for Rate-Distortion Optimal Motion Estimation in Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, June 2010, pp. 820-828.
- 7. S. Schaeckeler and W. Shang, "Optimizing the Stack of Recursive Functions". In Journal of Computer Languages, Systems & Structures, Elsevier, October 2009.
- 8. S. Subha, W. Shang: “A Hybrid Data Cache with Improved Locality and Replacement,” International Journal of Computational Science, December 2008, Vol 2, No. 6, pages 709-736.
- 9. S. Schaeckeler, W. Shang and R. Davis, "Compiler Optimization Pass Visualization: The Procedural Abstraction Case,” ACM Transactions on Computing Education, Vol. 9, June 2009, pages 1-13, New York, NY, USA.
- 10. S. Schaeckeler, W. Shang and R. Davis, "Visualization of Procedural Abstraction," In Electronic Notes in Theoretical Computer Science, Vol. 224, pages 27-39, 2009.
- 11. S. Schaeckeler and W. Shang, "Optimizing the Stack of Recursive Functions". In Journal of Computer Languages, Systems & Structures, Elsevier, accepted in April 2008 (in press).
- 12. Jun Zhang; Xiaoquan Yi; Nam Ling; Weijia Shang, "Bit rate distribution for motion estimation in H.264 coding", IEEE Transactions on Consumer Electronics, Vol. 52, P606- 610, May 2006
- 13. J. Dou, W. Shang and Z. Chen, “Nanometer Size Recrystallization Grain Induced by Applied Electric Field,” Applied Surface Science 236(2004) 57-62, June 2004.
- 14. J. Dou, W. Shang, and Z. Chen, “Formation Processes of Fine Structures Induced by High Electric Fields,” Journal of Vacuum Science and Technology B22 (2), March/April 2004.
- 15. J. Dou, C. Zhu, E. Chen, D. Yang, and W. Shang, “Studies on the String Streams of Electrons between Crystal Faces with a HRTFEM,” Microscopy and Analysis, July, 2003, p. 13.
- 16. J. Dou, W. Shang, X. Zhang, and Z. Chen, “Surface Reconstruction Induced Electrically in Two Steps,” Surface Science, 539 (2003), L519-L524
- 17. E. Hodzic, and W. Shang, "On Time Optimal Supernode Shape," IEEE Transactions on Parallel and Distributed Systems, December 2002, pp. 1220-1233 (regular).
- 18. R. Grover, W. Shang, Q. Li, “Bit-Level Two’s Complement Matrix Multiplication,” Integration the VLSI Journal, Elsevier Science, December 2002, vol 33/1-2, pp. 3-21.
- 19. E. Hodzic, and W. Shang, "On Supernode Transformation with Minimized Total Running Time," IEEE Transactions on Parallel and Distributed Systems, May 1998, pp. 417-428 (regular).
- 20. W. Shang, E. Hodzic, and Z. Chen, "On Uniformization of Affine Dependence Algorithms," IEEE Transactions on Computers, Vol. 45, NO. 7, July 1996, pp. 827 – 840 (regular).
- 21. W. Shang, M.T. O'Keefe and J.A.B. Fortes, "On Loop Transformations for Generalized Cycle Shrinking,” IEEE Trans. on Parallel and Distributed Systems, Feb. 1994, pp. 193-204 (regular).
- 22. W. Shang and J.A.B. Fortes, "Time Optimal Linear Schedules for Algorithms with Uniform Dependencies," IEEE Trans. on Computers, Vol. 40, No. 6, June 1991, pp. 723-742 (regular).
- 23. W. Shang and J.A.B. Fortes, "On Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays,” IEEE Trans. on Parallel and Distributed Systems, Vol. 3, No. 3, May 1992, pp. 350-363 (regular).
- 24. W. Shang and J.A.B. Fortes, "Independent Partitioning of Algorithms with Uniform Dependencies,” IEEE Trans. on Computers, Vol. 41,140. 2, February 1992, pp. 190-206 (regular).
- 25. W. Shang and J.A.B. Fortes, "On Optimality of Linear Schedules," Journal of VLSI Signal Processing, 1, 1989, Kluwer Academic Publishers, Boston, pp. 209-220 (regular).
- 26. B.W. Wah, M. Aboelaze and W. Shang, "Systematic Design of Macropipelines of Systolic Arrays," Journal of Parallel and Distributed Computing, 5, 1988, pp. 1-25 (regular).
**Conference Publications:**- 1. H. Kou, W. Shang "Parallelized feature extraction and acoustic model training", Digital Signal Processing (DSP), 2014 19th International Conference (DSP2014, IEEE & IET), Aug 2014
- 2. H. Kou, W. Shang, J. Chong, I. Lane "Optimized MFCC Feature Extraction on GPU", The 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP & IEEE), May 2013.
- 3. H. Kou, W. Shang, J. Chong, I. Lane "Efficient MFCC feature extraction on Graphics Processing Units", 2013 Constantinides International Workshop on Signal Processing (CIWSP, IEEE & IET), Jan 2013.
- 4. H. Kou, W. Shang, J. Chong, I. Lane "GPU Based Feature Extraction Implementation", GPU Technology Conference (GTC), September, 2012.
- 5. J. Steinbrecher and W. Shang, "On Optimizing the Longest Common Subsequence Problem by Loop Unrolling Along Wavefronts", Parallel, Distributed and Network-Based Processing (PDP), 20th Euromicro International Conference on, pp.603-611, Feb. 15-17, 2012, acceptance rate: 38%. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6169647&isnumber=6169521
- 6. J. Steinbrecher and W. Shang, “On Supernode Transformations and Multithreading For The Longest Common Subsequence Problem”, Proc. Australasian Symposium on Parallel and Distributed Computing (AusPDC 2012), Melbourne, Australia, pp.3-12, Jan. 30 - Feb 2, 2012, acceptance rate: 30%, https://www.semanticscholar.org/paper/On-supernode-transformations-and-multithreading-for-Steinbrecher-Shang/bec45ffb61604fb768e92ee0b703f8a71eb22e48
- 7. J. Steinbrecher and W. Shang, "Exploiting Thread Level Parallelism by Loop Unrolling along Wavefronts", Parallel and Distributed Computing and Systems - 2011, Dallas, USA, paper 083, Dec. 14 - 16, 2011, acceptance rate: 50%, https://www.actapress.com/Abstract.aspx?paperId=453059
- 8. C. Neely, G. Brebner, W. Shang. “ShapeUp: A High-Level Design Approach to Simplify Module Interconnection on FPGAs”, Proc. 18th Annual IEEE Symposium on Custom Computing Machines, May 2010, pp.141-148.
- 9. C. Neely, G. Brebner, W. Shang. “Flexible and Modular Support for Timing Functions in High Performance Networking Acceleration”, Proc. 20th International Conference on Field Programmable Logic and Applications, August 2010, pp.513-518.
- 10. H. Cui, X. Su, and W. Shang, "Optimal Dissemination of Layered Videos in P2P-Based IPTV Networks," Proceedings of 2009 IEEE International Conference on Multimedia & Expo (ICME 2009), New York, USA, July 2009.
- 11. S. Schaeckeler and W. Shang, "Procedural Abstraction with Reverse Prefix Trees", CGO '09: Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization, Washington, USA, March 2009.
- 12. H. Cui, X. Su, and W. Shang, “An optimal media distribution algorithm in P2P-based IPTV,” in Proc. 3rd Int’l Conf. on Communications and Networking, China, Aug 2008.
- 13. S. Schaeckeler, W. Shang and R. Davis, "Visualization of Procedural Abstraction". In PVW '08: Proceedings of the 5th Program Visualization Workshop, Madrid, Spain, July 3, 2008.
- 14. S. Schaeckeler and W. Shang, "Live Range Splitting at Recursive Function Calls". In ITT '07: Proceedings of the 2007 International Conference on Innovations in Information Technology, pages 337-341, Dubai, United Arab Emirates, Nov. 2007.
- 15. M. Pantoja, N. Ling, and W. Shang, “Coefficient Conversion for Transform Domain VC-1 to H.264 Transcoding,” Proceedings of the 2007 IEEE Workshop on Signal Processing Systems (SiPS), Shanghai, China, October 17 - 19, 2007.
- 16. S. Schaeckeler and W. Shang, 2007, “Stack size reduction of recursive programs,” Proceedings of the 2007 international Conference on Compilers, Architecture, and Synthesis For Embedded Systems (Salzburg, Austria, September 30 - October 03, 2007). CASES '07. ACM Press, New York, NY, 48-52. DOI= http://doi.acm.org/10.1145/1289881.1289892
- 17. Jun Zhang, Xiaoquan Yi, Nam Ling, and Weijia Shang, “Chroma Coding Efficiency Improvement with Context Adaptive Lagrange Multiplier (CALM),” Proceedings of the 2007 IEEE International Symposium on Circuits and Systems (ISCAS), New Orleans, Louisiana, USA, pp. 293 - 296, May 27 - 30, 2007.
- 18. S. Subha and W. Shang, “Variable Block Size Architecture for Matrix Multiplication,” Proceedings of 3rd International Conference Obcom-2006, Vellore, India, December 2006, pp. 187-191.
- 19. J. Zhang, X. Yi, N. Ling, and W. Shang, “Context adaptive Lagrange multiplier (CALM) for motion estimation in JM -- Improvement,” JVT-T046, Joint Video Team of ISO/IEC MPEG & ITU-T VCEG, 20th Meeting, Klagenfurt, Austria, July 17– 21, 2006.
- 20. S. Schäckeler and W. Shang, “Stack Compaction for Memory Constrained Systems,” IEEE International Conference on Computing and Informatics, June 2006, Kuala Lumpur, Malaysia.
- 21. J. Zhang, X. Yi, N. Ling, and W. Shang, “Context adaptive Lagrange multiplier (CALM) for motion estimation in JM", JVT-S028, Joint Video Team of ISO/IEC MPEG & ITU-T VCEG, 19th Meeting, Geneva, Switzerland, 31 March - 7 April 2006.
- 22. J. Zhang, X. Yi, N. Ling and W. Shang, “Bit Rate Distribution Analysis for Motion Estimation in H.264.” IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, Nevada, USA, January 9 - 11, 2006. G & F.
- 23. X. Yi, J. Zhang, N. Ling, and W. Shang, “Improved and simplified fast motion estimation for JM,” JVT-P021.doc, Joint Video Team of ISO/IEC MPEG & ITU-T VCEG, 16th Meeting, Poznan, Poland, Jul. 24-29, 2005.
- 24. R. Grover, S. Krishnan and W. Shang. Performance Trade-offs of DCT with Variable Length Carry Chains in FPGAs. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA '04, June 21-24, 2004, Las Vegas, pp. 442-448
- 25. S. Subha and W. Shang, “On Data Locality in Supernode Transformation,” Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Vol. IV, Las Vegas, USA, June 23-26, 2003, pp. 1635-1641.
- 26. R. Grover, W. Shang, and Q. Li, "A Faster Distributed Arithmetic Architecture for FPGAs," Proceedings of 10th International Symposium on Field Programmable Gate Arrays, Monterey, California, February 24-26, 2002, pp. 31-39.
- 27. R. Grover, W. Shang, Q. Li, “A Comparison of FPGA Implementations of Two’s Complement Bit-Level and Word-Level Matrix Multipliers,” Poster session, FPGA 2001, Monterey California, Feb. 11-13, pp.223.
- 28. R. Grover, W. Shang, and Q. Li, "A Comparison of FPGA implementations of Bit-level and word-level Matrix Multiplier," Proceedings of 10th International Conference on Field Programmable Logic and Applications, Villach, Austria, August 27-30, 2000, pp. 422-431.
- 29. R. Grover, W. Shang, and Q. Li, "An improved Architecture for Bit-level Matrix Multiplication," Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Vol. IV, Las Vegas, USA, June 26-29, 2000, pp. 2257-2264.
- 30. E. Hodzic, and W. Shang, "On Supernode Shape", Proceedings of the Eighth International Workshop on Compilers for Parallel Computers, pp. 367-380, Aussois, France, January 4-7, 2000.
- 31. E. Hodzic, and W. Shang "On Time Optimal Supernode Shape,” Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, June 28 – July 1, 1999, pp. 2019-2026.
- 32. P. Wang, W. Shang and G. Lamble, “Mapping a Parallel Bit-Level Matrix Multiplication Algorithm to FPGA Chips,” IASTED International Conference on Modelling and Simulation, May 5, 1999.
- 33. E. Hodzic, and W. Shang "Time Optimal Supernode Shape for Algorithms with n Extreme Dependence Directions," International Conference on Parallel and Distributed Computing and Networks, Australia, December 1998.
- 34. W. Shang, M. Ghanta, O. Hellwig, and E. Hodzic, “Estimating an Optimal Number of Processors and Message Vector Size for Special Classes of Computations,” Proceedings of Euro-PDS 98, Austria, July 1-3, 1998.
- 35. E. Hodzic and W. Shang, "On Supernode Partitioning Hyperplanes for Two Dimensional Algorithms," Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN '97), Singapore, August 11-13, 1997, pp. 83 - 88.
- 36. E. Hodzic and W. Shang, "On Supernode Transformation with Minimized Total Running Time," Proceedings of The International Conference on Application Specific Systems, Architectures and Processors, Chicago, IL., August 19-21, 1996, pp. 402 – 414
- 37. E. Hodzic and W. Shang, "On Optimal Size and Shape of Supernode Transformation," Proceedings of 1996 International Conference on Parallel Processing," Chicago, IL., August, 1996, Vol. III, pp. 25 - 34.
- 38. M.A. Schaar, K. Efe and W. Shang, "Queuing Performance Analysis of Co-Scheduling in a Pool of Processor Environment,” Proc. of Int’l Conference on Supercomputing, July 1994, Manchester, England, U.K. pp. 313-322.
- 39. W. Shang and Z. Shu, "Data alignment of nested loops without nonlocal communications," Proc. of IEEE Int’l Conference on Application Specific Array Processors, edited by P. Cappello et al, August 1994, pp. 439-450.
- 40. Z. Xing and W. Shang, "An Algorithm for Accurate Data Dependence Test,” Proc. IEEE Int’l Conference on Application Specific Array Processors, edited by L. Dadda and B.W. Wah, Oct. 1993, Italy, pp. 404 - 415.
- 41. W. Shang and B.W. Wah, "Dependence Analysis and Architecture Design for Bit-Level Algorithms,” Proc. 22nd Int’l Conference on Parallel Processing, Aug. 1993, St. Charles, Illinois, pp. 30-38.
- 42. Z. Chen and W. Shang, "Mapping Uniform Dependence Algorithms onto Fixed Size Processor Arrays,” Proc. 7th Int’l Parallel Processing Symposium, Newport Beach, California, April 1993, pp., 804-809.
- 43. Z. Xing, W. Shang and S. Xiao, "A Piecewise Linear Programming Approach for Data Dependence Analysis," Proc. 6th SIAM Conference on Parallel Processing for Scientific Computing, Norfolk, Virginia, March 1993, pp. 829-835.
- 44. Z. Xing and W. Shang, "Interval Test: An Application of Interval Analysis in Data Dependence Test,” presented at Int’l Conference on Numerical Analysis with Automatic Result Verification, Mathematics, Applications and Software, Lafayette, LA, Feb. 1993.
- 45. Z. Chen and W. Shang, "On Uniformization of Affine Dependence Algorithms," Proc. IEEE Fourth Symposium on Parallel and Distributed Processing, Arlington, M Dec. 1992, pp. 128-137.
- 46. Z. Yang, W. Shang and J.A.B. Fortes, "Conflict-Free Scheduling of Nested Loop Algorithms on Lower Dimensional Processor Arrays," Proc. 6th IEEE Int'l Parallel Processing Symposium, March 1992, Beverly Hills, CA., pp. 156-164.
- 47. W. Shang, M.T. O'Keefe and J.A.B. Fortes, "On Loop Transformation for Generalized Cycle Shrinking,” Proc. Int'l Conf. on Parallel Processing, Aug. 1991, St. Charles, Illinois, pp. 132-141 (11)
- 48. W. Shang and J.A.B. Fortes, ``Exploiting Parallelism with Linear Schedules,'' Proc. Int'l Conference for Young Computer Scientists, July 18 - 209 1991, Beijing, China.
- 49. W. Shang, M.T. O’Keefe and J.A.B. Fortes, "Generalized Cycle Shrinking,” Proc. Int’l Conference on Algorithms and Parallel VLSI Architectures, II, North-Holland, Amsterdam 1991.
- 50. W. Shang and J.A.B. Fortes, ‘Mapping Algorithms onto Parallel Architectures: Time Schedules," Proc. Second Int'l IEEE Specialist Seminar on "The Design and Application of Parallel Digital Signal Processors," Lisbon, Portugal, April 15 - 19, 1991.
- 51. W. Shang and J.A.B. Fortes, "Time Optimal and Conflict-free Mappings of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays," Proc. Int’l Conf. on Parallel Processing, Aug. 1990, St., Charles, Illinois, pp. 101-110 (I).
- 52. W. Shang and J.A.B. Fortes, "Time Optimal Linear Schedules for Algorithms with Uniform Dependencies,” Proc. Int’l Conf. on Systolic Arrays, May 1988, San Diego, California, pp. 392-402.
- 53. W. Shang and. J.A.B. Fortes, "Independent Partitioning of Algorithms with Uniform Dependencies," Proc. Int'l Conf. on Parallel Processing, Aug. 1988, SL Charles, Illinois, pp. 26-33 (II).
- 54. W. Shang and B.W. Wah, "Buffering in Macropipelines of Systolic Arrays," presented at Midwest VLSI Workshop, Jan. 1985, Ohio State University.
- 55. B.W. Wah, W. Shang and M. Aboelaze, "Design of Macropipelines of Systolic Arrays,” in Proc. IEEE Computer Society Workshop on Computer Architecture for Pattern Analysis and Image Database Management, Nov. 1985.
**Book Chapters:**- J. Dou and W. Shang, “On Applications of Thermal Field Emssion,” a book chapter in Surface Science Research, Nova Science Publishers, editor: Charles P. Norris, ISBN: 1-59454-159-0, 2004, p. 1-20.
- J.A.B. Fortes, B.W. Wah, W. Shang and K.N. Ganapathy, ``Algorithm-Specific Parallel Processing with Linear Arrays,'' Advances in Computers, edited by M. Yovits, Academic Press, Vol. 38, 1994, pp. 198- 247.
**Technical Reports:**- Z. Chen and W. Shang, "On Uniformization of Affine Dependence Algorithms," Technical Report No. 92-3-3, The Center for Advanced Computer Studies, The University of Southwestern Louisiana, LA, 70504, Sept. 1992.
- Z. Xing and W. Shang, "Polynomial Average Time and Exact Data Dependencies Analysis,” Technical Report No. 92-3-5, Center for Advanced Computer Studies, The University of Southwestern Louisiana, LA, 70504, Dec. 1992.
- Z. Yang, W. Shang and J.A.B. Fortes, "Conflict-Free Scheduling of Nested Loop Algorithms on Lower Dimensional Processor Arrays,” Technical Report No. 92-3-1, The Center for Advanced Computer Studies, The University of Southwestern Louisiana, LA, 70504, Jan. 1992.
- W. Shang and J.A.B. Fortes, "Partitioning of Uniform Dependency Algorithms for Parallel Execution on MIMD/Systolic Systems,” Technical Report No. 88-18, Purdue University, W. Lafayette, IN 47907, April 1988 (34 pages).
- W. Shang and J.A.B. Fortes, "Time-Optimal and Conflict-free Mappings of Uniform Dependence Algorithms into Lower 'Dimensional Processor Arrays" Technical Report No. 90-29, Purdue University, W. Lafayette, IN 47907, April 1990 (31 pages).
- W. Shang, "Scheduling, Partitioning and Mapping of Uniform Dependence Algorithms on Processor Arrays," Ph.D. Thesis, May 1990, Purdue University, W. Lafayette, IN 47907 (236 pages).
**INVITED PRESENTATIONS**- W. Shang "On Time Optimal Supernode Shape,” invited by Eighth Workshop on Compiler for Parallel Computers, January 4-7, 2000, Aussois, France.
- W. Shang, "On Optimal Size and Shape of Supernode Transformations," presented at the Dagstuhl-Seminar 9616 "Loop Parallelization," Germany, April 15-19 1996.
- W. Shang, "From Algorithm to FPGAs," presented at Xilinx, April 10, 1996.
- W. Shang, "Minimizing Communication Overheads in Parallel Systems," presented in Department of Computer Science, San Jose State University, September 1994.
- W. Shang, "Fundamental Problems in Parallel Programming," presented at Beijing University of Aeronautics and Astronautics and Tsinghua University, Beijing, China, July 1994.
- W. Shang, "Two Fundamental Problems in Programming Distributed Memory Parallel Computers,'' presented at NASA Macs, Match 18, 1994.
- W. Shang, "Polynomial Average Time and Exact Dam Dependence Analysis” presented at the Dagstuhl-Seminar on Parallelization Techniques for Uniform. Algorithms, Germany, June 1993 (also invited by European Computer Research Center in Munich, Germany).
- Shang, "On Programming Massively Parallel Processor Arrays," presented in Bellaire Research Center, Shell Development Company, Houston, TX, April 9, 1992.